gateway-setup

$npx mdskill add joelhooks/joelclaw/gateway-setup

Sets up a persistent AI agent gateway on macOS with Redis event bridge, heartbeat monitoring, and multi-session routing.

  • Helps make an AI agent always-on and reachable for background workflows and notifications.
  • Integrates with Redis, Inngest, Tailscale, and Telegram for event handling and remote access.
  • Uses interactive Q&A to match user intent from minimal to full gateway configurations.
  • Routes responses to external channels like Telegram or WebSocket based on setup choices.

SKILL.md

.github/skills/gateway-setupView on GitHub ↗
---
name: gateway-setup
displayName: Gateway Setup
description: "Set up a persistent AI agent gateway on macOS with Redis event bridge, heartbeat monitoring, and multi-session routing. Interactive Q&A to match your intent — from minimal (Redis + extension) to full (embedded daemon + Telegram + watchdog). Use when: 'set up a gateway', 'I want my agent always on', 'event bridge', 'heartbeat monitoring', 'agent notifications', or any request to make an AI agent persistent and reachable."
version: 1.0.0
author: Joel Hooks
tags: [joelclaw, gateway, setup, redis, telegram]
---

# Gateway Setup for AI Agents on macOS

This skill builds a persistent gateway for an AI coding agent on a Mac. It bridges background workflows (Inngest, cron, pipelines) into your agent's session and optionally routes responses to external channels (Telegram, WebSocket).

## Before You Start

**Required:**
- macOS (Apple Silicon preferred)
- pi coding agent installed and working
- Redis running (locally, Docker, or k8s — see `inngest-local` skill if you need this)

**Optional (unlocks more features):**
- Inngest self-hosted (for durable event-driven workflows)
- Tailscale (for secure remote access from phone/laptop)
- Telegram bot token (for mobile notifications/chat)

## Critical Setup Notes

**`GATEWAY_ROLE=central` is required for the always-on session.** Without it, the session runs as a satellite and misses heartbeats, system alerts, and any events not targeted at it specifically. Set it when launching:
```bash
GATEWAY_ROLE=central pi
```

**`serveHost` is mandatory when Inngest runs in Docker and the worker runs on the host.** The SDK advertises `localhost:3100` as its callback URL, but Docker can't reach the host's loopback. Set it in your Hono serve handler:
```typescript
inngestServe({
  client: inngest,
  functions,
  serveHost: "http://host.docker.internal:3100",
})
```
Then force re-sync: `curl -X PUT http://localhost:3100/api/inngest`

**ioredis resolution in Bun is flaky.** If you get `Cannot find module '@ioredis/commands'`, install it explicitly:
```bash
bun add @ioredis/commands
# or: rm -rf node_modules && bun install
```

**Two ioredis clients required for pub/sub.** A subscribed client can't run LRANGE, DEL, or other commands. The extension creates separate `sub` and `cmd` clients.

## Intent Alignment

Before building anything, ask the user these questions to determine scope. Adapt based on their answers.

### Question 1: What's your goal?

Present these options:
1. **Notifications only** — background jobs finish, I want to know about it without watching the terminal
2. **Always-on agent** — I want a persistent session that survives terminal closes, handles heartbeats, routes events
3. **Full gateway** — always-on + talk to my agent from Telegram/phone + multi-session routing

Each level builds on the previous. Start with what they need now.

### Question 2: What's your event source?

1. **Just cron/timers** — I want a heartbeat that checks system health periodically
2. **Inngest functions** — I have durable workflows that emit completion events
3. **Mixed** — Inngest + cron + maybe webhooks

### Question 3: How many concurrent agent sessions?

1. **One** — I run one pi session at a time
2. **Multiple** — I often have 2-5 sessions in different terminals working on different things

If multiple: enable central/satellite routing. If one: simpler single-session mode.

## Architecture Tiers

### Tier 1: Notification Bridge (simplest)

**What you get:** Background events show up in your pi session as messages.

**Components:**
- Redis (already running)
- Gateway pi extension (~100 lines)
- `pushGatewayEvent()` utility function

**How it works:**
```
Background process → Redis LPUSH → pi extension drains on notify → injected as user message
```

**Build steps:**

1. Create the extension directory:
```bash
mkdir -p ~/.pi/agent/extensions/gateway
```

2. Create `~/.pi/agent/extensions/gateway/package.json`:
```json
{
  "name": "gateway-extension",
  "private": true,
  "dependencies": {
    "ioredis": "^5.4.2"
  }
}
```

3. Install dependencies:
```bash
cd ~/.pi/agent/extensions/gateway && npm install
```

4. Create `~/.pi/agent/extensions/gateway/index.ts` with the minimal bridge:

```typescript
import type { ExtensionAPI, ExtensionContext } from "@mariozechner/pi-coding-agent";

const SESSION_ID = "main";
const EVENT_LIST = `agent:events:${SESSION_ID}`;
const NOTIFY_CHANNEL = `agent:notify:${SESSION_ID}`;

type RedisLike = {
  on(event: string, listener: (...args: unknown[]) => void): void;
  connect(): Promise<void>;
  subscribe(channel: string): Promise<unknown>;
  lrange(key: string, start: number, stop: number): Promise<string[]>;
  del(key: string): Promise<number>;
  llen(key: string): Promise<number>;
  unsubscribe(): void;
  disconnect(): void;
};

type RedisCtor = new (options: { host: string; port: number; lazyConnect: boolean }) => RedisLike;

let Redis: RedisCtor | null = null;
let sub: RedisLike | null = null;
let cmd: RedisLike | null = null;
let ctx: ExtensionContext | null = null;
let piRef: ExtensionAPI | null = null;

interface SystemEvent {
  id: string;
  type: string;
  source: string;
  payload: Record<string, unknown>;
  ts: number;
}

function formatEvents(events: SystemEvent[]): string {
  return events.map((e) => {
    const time = new Date(e.ts).toLocaleTimeString("en-US", { hour12: false });
    return `- **[${time}] ${e.type}** (${e.source})`;
  }).join("\n");
}

async function drain(): Promise<void> {
  if (!cmd || !piRef) return;
  const raw = await cmd.lrange(EVENT_LIST, 0, -1);
  if (raw.length === 0) return;

  const events = raw.reverse().map(r => {
    try { return JSON.parse(r) as SystemEvent; } catch { return null; }
  }).filter(Boolean) as SystemEvent[];

  if (events.length === 0) { await cmd.del(EVENT_LIST); return; }

  const prompt = [
    `## 🔔 ${events.length} event(s) — ${new Date().toISOString()}`,
    "", formatEvents(events), "",
    "Take action if needed, otherwise acknowledge briefly.",
  ].join("\n");

  if (ctx?.isIdle()) {
    piRef.sendUserMessage(prompt);
  } else {
    piRef.sendUserMessage(prompt, { deliverAs: "followUp" });
  }
  await cmd.del(EVENT_LIST);
}

export default function (pi: ExtensionAPI) {
  piRef = pi;

  pi.on("session_start", async (_event, _ctx) => {
    ctx = _ctx;
    if (!Redis) {
      try {
        Redis = (await import("ioredis")).default as RedisCtor;
      } catch (error) {
        const message = error instanceof Error ? error.message : String(error);
        _ctx.ui.notify(`Gateway extension running without Redis: ${message}`, "warning");
        return;
      }
    }

    sub = new Redis({ host: "localhost", port: 6379, lazyConnect: true });
    cmd = new Redis({ host: "localhost", port: 6379, lazyConnect: true });
    await sub.connect();
    await cmd.connect();

    await sub.subscribe(NOTIFY_CHANNEL);
    sub.on("message", () => { if (ctx?.isIdle()) drain(); });

    // Drain anything that accumulated while session was down
    const pending = await cmd.llen(EVENT_LIST);
    if (pending > 0) await drain();

    ctx.ui.setStatus("gateway", "🔗 connected");
  });

  pi.on("agent_end", async () => { drain(); });

  pi.on("session_shutdown", async () => {
    if (sub) { sub.unsubscribe(); sub.disconnect(); }
    if (cmd) { cmd.disconnect(); }
  });
}
```

5. Push events from any script:
```typescript
import Redis from "ioredis";
const redis = new Redis();

async function pushEvent(type: string, source: string, payload = {}) {
  const event = { id: crypto.randomUUID(), type, source, payload, ts: Date.now() };
  await redis.lpush("agent:events:main", JSON.stringify(event));
  await redis.publish("agent:notify:main", JSON.stringify({ type }));
}

// Example: notify when a download finishes
await pushEvent("download.complete", "my-script", { file: "video.mp4" });
```

6. Restart pi — the extension loads automatically.

### Tier 2: Always-On with Heartbeat

**Adds:** Cron heartbeat, watchdog failure detection, boot sequence.

**Additional components:**
- HEARTBEAT.md checklist (read by the agent on each heartbeat)
- Watchdog timer in the extension
- tmux or launchd for persistence

**Build steps (on top of Tier 1):**

1. Create `~/HEARTBEAT.md` (or wherever your agent's home is):
```markdown
# Heartbeat Checklist

## System Health
- [ ] Redis is reachable
- [ ] Background worker is responding
- [ ] No stuck jobs

## Pending Work
- [ ] Check inbox for unprocessed items

If nothing needs attention, reply HEARTBEAT_OK.
```

2. Add heartbeat cron — if using Inngest:
```typescript
export const heartbeatCron = inngest.createFunction(
  { id: "system-heartbeat" },
  [{ cron: "*/15 * * * *" }],
  async ({ step }) => {
    await step.run("push-heartbeat", async () => {
      await pushEvent("cron.heartbeat", "inngest", {});
    });
  }
);
```

Or without Inngest, use a simple cron/setInterval:
```bash
# crontab -e
*/15 * * * * redis-cli LPUSH agent:events:main '{"id":"'$(uuidgen)'","type":"cron.heartbeat","source":"cron","payload":{},"ts":'$(date +%s000)'}' && redis-cli PUBLISH agent:notify:main '{"type":"cron.heartbeat"}'
```

3. Add watchdog to the extension (insert after session_start):
```typescript
const WATCHDOG_THRESHOLD_MS = 30 * 60 * 1000; // 2x the 15-min interval
let lastHeartbeatTs = Date.now();
let watchdogAlarmFired = false;

setInterval(() => {
  if (!piRef || !ctx) return;
  if (watchdogAlarmFired) return;
  const elapsed = Date.now() - lastHeartbeatTs;
  if (elapsed > WATCHDOG_THRESHOLD_MS) {
    watchdogAlarmFired = true;
    piRef.sendUserMessage(`## ⚠️ MISSED HEARTBEAT\n\nNo heartbeat in ${Math.round(elapsed / 60000)} minutes. Check your worker/cron.`);
  }
}, 5 * 60 * 1000);

// Reset on heartbeat receipt (inside drain function):
// if (events.some(e => e.type === "cron.heartbeat")) {
//   lastHeartbeatTs = Date.now();
//   watchdogAlarmFired = false;
// }
```

4. Make it survive terminal close — tmux:
```bash
tmux new-session -d -s agent -x 120 -y 40 "pi"
# Attach: tmux attach -t agent
# Detach: Ctrl-B, D
```

Or launchd for full always-on:
```xml
<!-- ~/Library/LaunchAgents/com.you.agent-gateway.plist -->
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
  <key>Label</key><string>com.you.agent-gateway</string>
  <key>ProgramArguments</key>
  <array>
    <string>/bin/bash</string>
    <string>-c</string>
    <string>tmux new-session -d -s agent -x 120 -y 40 "GATEWAY_ROLE=central pi" && while tmux has-session -t agent 2>/dev/null; do sleep 5; done</string>
  </array>
  <key>RunAtLoad</key><true/>
  <key>KeepAlive</key><true/>
  <key>StandardOutPath</key><string>/tmp/agent-gateway.log</string>
  <key>StandardErrorPath</key><string>/tmp/agent-gateway.log</string>
</dict>
</plist>
```

Load it:
```bash
launchctl load ~/Library/LaunchAgents/com.you.agent-gateway.plist
```

### Tier 3: Multi-Session + Central/Satellite

**Adds:** Multiple pi sessions, smart event routing.

**When you need this:** You run 2+ pi sessions simultaneously — one for oversight, others for coding tasks. Heartbeats should only go to the oversight session.

**Key changes from Tier 1:**

1. Session ID becomes role-based:
```typescript
const ROLE = process.env.GATEWAY_ROLE ?? "satellite";
const SESSION_ID = ROLE === "central" ? "gateway" : `pid-${process.pid}`;
```

2. Sessions register in a Redis set:
```typescript
await cmd.sadd("agent:gateway:sessions", SESSION_ID);
// On shutdown:
await cmd.srem("agent:gateway:sessions", SESSION_ID);
```

3. `pushEvent` fans out to targets:
```typescript
async function pushEvent(type, source, payload, originSession?) {
  const event = { id: crypto.randomUUID(), type, source, payload, ts: Date.now() };
  const json = JSON.stringify(event);
  const sessions = await redis.smembers("agent:gateway:sessions");
  
  const targets = new Set<string>();
  if (sessions.includes("gateway")) targets.add("gateway"); // central always
  if (originSession && sessions.includes(originSession)) targets.add(originSession);
  
  for (const sid of targets) {
    await redis.lpush(`agent:events:${sid}`, json);
    await redis.publish(`agent:notify:${sid}`, JSON.stringify({ type }));
  }
}
```

4. Start your central session:
```bash
GATEWAY_ROLE=central pi  # This one gets ALL events
```

5. Other sessions start normally (satellites):
```bash
pi  # Gets only events it initiated
```

### Tier 4: External Channels (Telegram, WebSocket)

**Adds:** Talk to your agent from your phone.

This tier requires the embedded daemon approach — pi runs as a library inside a Node.js process, not as a TUI. See the joelclaw.com article "Building a Gateway for Your AI Agent" for the full architecture.

**Key components:**
- `createAgentSession()` from pi SDK — headless agent session
- grammY for Telegram bot
- Command queue that serializes all inputs (TUI, heartbeat, Telegram)
- Outbound router that sends responses back to the asking channel

This is the most complex tier. Only build it if you actually need mobile access.

## Verification Checklist

After setup, verify:
- [ ] Pi session starts and extension loads (check status bar for 🔗)
- [ ] Push a test event: `redis-cli LPUSH agent:events:main '{"id":"test","type":"test","source":"manual","payload":{},"ts":0}'` + `redis-cli PUBLISH agent:notify:main test`
- [ ] Event appears in pi session within seconds
- [ ] (Tier 2+) Heartbeat fires on schedule
- [ ] (Tier 2+) Kill the heartbeat source — watchdog alarm fires after 30 min
- [ ] (Tier 3+) Central session receives heartbeats, satellite sessions don't

## Setup Script (curl-first)

For automated setup, the user can run:
```bash
curl -sL joelclaw.com/scripts/gateway-setup.sh | bash
```

Or with a specific tier:
```bash
curl -sL joelclaw.com/scripts/gateway-setup.sh | bash -s -- 2
```

The script is idempotent, detects Redis, installs the extension, and configures persistence for Tier 2+.

## Decision Chain (compressed ADRs)

Sequential architecture decisions that led to the current gateway design. Each solved a real problem discovered in the previous iteration.

| # | ADR | Decision | Problem Solved | Key Tradeoff |
|---|-----|----------|---------------|-------------|
| 1 | [0010](/adrs/0010-system-loop-gateway) | Hybrid cron + event gateway | Manual triage bottleneck | Always-on LLM session = expensive. Cron = latency. Hybrid balances both. |
| 2 | [0018](/adrs/0018-pi-native-gateway-redis-event-bridge) | Redis event bridge (pi extension) | No Inngest→pi bridge existed | Extension-only, no separate process. Redis as the clean interface boundary. |
| 3 | [0035](/adrs/0035-gateway-session-routing-central-satellite) | Central + satellite routing | Heartbeats interrupting coding sessions | Fan-out by role. Central gets all, satellites get only origin-targeted. |
| 4 | [0036](/adrs/0036-launchd-central-gateway-session) | launchd + tmux (superseded) | Gateway session dies on terminal close | Pi needs PTY. tmux provides it. launchd restarts on crash. |
| 5 | [0037](/adrs/0037-gateway-watchdog-layered-failure-detection) | 3-layer watchdog | "Who watches the watchmen" | Extension watchdog (Inngest down), launchd tripwire (pi down), heartbeat (everything healthy). |
| 6 | [0038](/adrs/0038-embedded-pi-gateway-daemon) | Embedded pi daemon (supersedes 0036) | No mobile access, no multi-channel | Embeds pi as library. grammY for Telegram. Command queue serializes all inputs. Most complex tier. |

**Read order for full context:** 0010 → 0018 → 0035 → 0037 → 0038 (skip 0036, superseded)

## Known Limitations

1. **Drain race condition.** The extension does `LRANGE` then `DEL` — not atomic. Events pushed between those calls are deleted without processing. The in-memory `seenIds` dedup prevents double-delivery but doesn't prevent lost events. Fix: use `LRANGE` + `LTRIM` or a Redis transaction. Low-impact on single-user systems but real.

2. **Redis connection recovery is notify-only.** If Redis goes down, the extension catches the error and logs it, but doesn't retry or reconnect automatically. ioredis `retryStrategy` handles reconnection at the client level, but accumulated events during the outage may be lost.

3. **Watchdog intervals are hardcoded.** Check interval (5 min) and threshold (30 min) are constants in the extension. Should be configurable via env vars or Redis config.

4. **No persistent dedup across restarts.** The `seenIds` Set lives in memory and caps at 500. Process restart = dedup resets. For the heartbeat-every-15-min use case this is fine. For high-frequency events it could cause duplicates.

## Credits

- [OpenClaw](https://github.com/openclaw/openclaw) — gateway-daemon pattern, command queue serialization, channel plugin architecture
- [pi coding agent](https://github.com/mariozechner/pi-coding-agent) — extension API, sendUserMessage(), session lifecycle
- [Inngest](https://www.inngest.com/) — durable event-driven workflows
- [joelclaw.com/building-a-gateway-for-your-ai-agent](/building-a-gateway-for-your-ai-agent) — human summary

More from joelhooks/joelclaw

SkillDescription
add-skillCreate new joelclaw skills with the idiomatic process — repo-canonical, symlinked, git-tracked, slogged. Triggers on 'add a skill', 'create skill', 'new skill', 'canonical skill', 'make a skill for', or any request to formalize a process or domain into a reusable skill.
adr-skillCreate and maintain Architecture Decision Records (ADRs) optimized for agentic coding workflows. Use when you need to propose, write, update, accept/reject, deprecate, or supersede an ADR; bootstrap an adr folder and index; consult existing ADRs before implementing changes; or enforce ADR conventions. This skill uses Socratic questioning to capture intent before drafting, and validates output against an agent-readiness checklist.
agent-discovery"Optimize websites, docs, and product surfaces for agent discoverability and operator UX. Use when working on agent SEO/AEO/GEO, crawl policy, markdown or JSON projections, llms.txt, sitemap.md, AGENTS.md guidance, content negotiation, accessibility for browser agents, or any request to make a site easier for pi, OpenCode, Claude Code, ChatGPT, Perplexity, or other agent harnesses to find and use."
agent-loopStart, monitor, and cancel durable multi-agent coding loops via Inngest. Use when the user wants to run autonomous coding workloads, execute a PRD with multiple stories, kick off an AFK coding session, have agents implement features from a plan, or manage running loops. Triggers on "start a coding loop", "run this PRD", "implement these stories", "go AFK and code this", "check loop status", "cancel the loop", "joelclaw loop", or any request for autonomous multi-story code execution.
agent-mail>-
agent-workloads"Compatibility alias for the canonical `workflow-rig` front door. Use when older prompts mention `agent-workloads` or when you need the legacy workload-planning guidance; for new work, load `workflow-rig` first."
clawmail>-
cli-design"Design and build agent-first CLIs with HATEOAS JSON responses, context-protecting output, and self-documenting command trees. Use when creating new CLI tools, adding commands to existing CLIs (joelclaw, slog), or reviewing CLI design for agent-friendliness. Triggers on 'build a CLI', 'add a command', 'CLI design', 'agent-friendly output', or any task involving command-line tool creation."
codex-prompting"Use this skill for any request to trigger, coordinate, or craft prompts for Codex. Use when user says 'send to codex', 'use codex', 'prompt codex', 'ask codex', 'delegate to codex', 'run in codex', or asks for a Codex-first execution handoff."
content-publish"Publish content to joelclaw.com via the Convex-first pipeline. Covers the full lifecycle: draft → review → publish → revalidate → verify. Handles secret leasing, tag conventions, content types (article, tutorial, note, essay), and verification gates. Use when: 'write article about X', 'publish article <slug>', 'draft a tutorial', 'publish this', 'push to convex', or any content publishing task."