sync-system-bus
$
npx mdskill add joelhooks/joelclaw/sync-system-busDeploy the system-bus-worker to the joelclaw Kubernetes cluster from a local machine when GitHub Actions fails.
- It helps deploy changes to Kubernetes when automated GitHub Actions cannot complete due to a missing self-hosted runner.
- It integrates with Docker, GHCR, Kubernetes, and uses a local script for authentication and deployment.
- It executes based on manual invocation of a publish script that handles building, pushing, and applying configurations.
- It delivers results by running a script that builds, authenticates, pushes images, and applies Kubernetes manifests.
SKILL.md
.github/skills/sync-system-busView on GitHub ↗
---
name: sync-system-bus
displayName: Sync System Bus
description: "Deploy the system-bus-worker to the joelclaw Kubernetes cluster from local machine. Use when syncing changes in packages/system-bus to k8s, especially because the GitHub Actions deploy job targets a non-existent self-hosted runner and cannot complete deploys automatically."
version: 2.1.0
author: Joel Hooks
tags: [joelclaw, system-bus, kubernetes, deploy, ghcr, inngest]
---
# Sync System Bus Worker
Deploy `system-bus-worker` to the local joelclaw k8s cluster (Talos v1.12.4 / k8s v1.35.0).
**Important:** `.github/workflows/system-bus-worker-deploy.yml` has a deploy job on `self-hosted`. That runner does not exist, so deploys must be completed locally.
## Quick Deploy
The publish script handles everything — build, auth, push, k8s apply, rollout, verification:
```bash
cd ~/Code/joelhooks/joelclaw
k8s/publish-system-bus-worker.sh
```
Optional: pass a tag (defaults to timestamp):
```bash
k8s/publish-system-bus-worker.sh a6de1e0
```
### GHCR Auth Order
`publish-system-bus-worker.sh` now authenticates in this order:
1. `GHCR_TOKEN` env var (if provided)
2. `secrets lease ghcr_pat` (agent-secrets)
3. `gh auth token` fallback
If your `gh auth token` lacks `read:packages/write:packages`, push will 403. Use `ghcr_pat`.
## What the Script Does
1. Builds ARM64 Docker image (required — Talos/Colima node is aarch64)
2. Authenticates to GHCR (prefers `agent-secrets` lease `ghcr_pat`; falls back to `gh auth token`) with temp Docker config
3. Pushes `ghcr.io/joelhooks/system-bus-worker:${TAG}` and `:latest`
4. Updates the image ref in `k8s/system-bus-worker.yaml`
5. `kubectl apply` the manifest
6. Waits for rollout (`--timeout=180s`)
7. Probes the new pod's health endpoint
## Post-Deploy Verification
```bash
joelclaw refresh # Re-register functions with Inngest
joelclaw functions | grep "<new-function>" # Verify new function appears
joelclaw status # Full health check
joelclaw runs --count 3 # Confirm runs are flowing
```
## Restart Safety (ADR-0156)
The worker is stateless between Inngest steps. Each step is a separate HTTP call; Inngest stores step output server-side. This means k8s rolling restarts are safe — Inngest retries the in-flight step against the new pod.
**Critical rule: NEVER set `retries: 0` on Inngest functions.** With retries: 0, a worker restart during step execution kills the run permanently. With retries ≥ 1, Inngest retries and hits the new pod.
Current story-pipeline has `retries: 2` specifically to survive the ~1s restart window during deploys.
### What happens during deploy
```
Step executing on old pod → old pod terminates → step fails (SDK unreachable)
→ Inngest retries after backoff → new pod handles retry → step completes
```
All previously completed steps are memoized. Only the in-flight step reruns.
### Long-running steps (codex implement: 5-10 min)
If a deploy kills a codex step mid-execution, the step reruns from scratch on the new pod (5-10 min wasted but not fatal). For time-critical deploys during active loops, check `joelclaw loop status` first and deploy between stories.
## Manual Steps (if script fails)
### Build
```bash
cd ~/Code/joelhooks/joelclaw
TAG=$(git rev-parse --short HEAD)
IMAGE="ghcr.io/joelhooks/system-bus-worker:${TAG}"
docker build --platform linux/arm64 -t "$IMAGE" -t ghcr.io/joelhooks/system-bus-worker:latest -f packages/system-bus/Dockerfile .
```
### Push
```bash
gh auth token | docker login ghcr.io -u $(gh api user -q .login) --password-stdin
docker push "$IMAGE"
docker push ghcr.io/joelhooks/system-bus-worker:latest
```
### Deploy
```bash
kubectl -n joelclaw set image deployment/system-bus-worker system-bus-worker="$IMAGE"
kubectl -n joelclaw rollout status deployment/system-bus-worker --timeout=180s
```
### Verify
```bash
joelclaw refresh
joelclaw status
```
### Log
```bash
slog write --action deploy --tool system-bus-worker --detail "deployed ${IMAGE}" --reason "sync worker changes"
```
## Talon Rebuild (Adding Secrets / Changing Worker Supervision)
Talon is a Rust binary that supervises the worker process. It leases secrets from `agent-secrets` and injects them as env vars. When adding new webhook secrets or changing supervision behavior:
```bash
# 1. Add secret to agent-secrets
secrets add my_new_secret --value "the-secret-value"
# 2. Update Talon source — add mapping to SECRET_MAPPINGS array
# File: ~/Code/joelhooks/joelclaw/infra/talon/src/worker.rs
# ("my_new_secret", "MY_NEW_SECRET_ENV_VAR"),
# 3. Recompile (fast — ~3s incremental)
export PATH="$HOME/.cargo/bin:$PATH"
cd ~/Code/joelhooks/joelclaw/infra/talon
cargo build --release
# 4. Install + re-sign (macOS kills unsigned binaries)
cp target/release/talon ~/.local/bin/talon
codesign -fs - ~/.local/bin/talon
# 5. Restart via launchd
launchctl bootout gui/$(id -u)/com.joel.talon
sleep 1
launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/com.joel.talon.plist
sleep 12
# 6. Verify
curl -s http://localhost:3111/ | jq '.status'
curl -X PUT http://localhost:3111/api/inngest # Force function sync
```
### Current SECRET_MAPPINGS (worker.rs)
| Secret Name | Env Var |
|------------|---------|
| `claude_oauth_token` | `CLAUDE_CODE_OAUTH_TOKEN` |
| `todoist_client_secret` | `TODOIST_CLIENT_SECRET` |
| `todoist_api_token` | `TODOIST_API_TOKEN` |
| `front_rules_webhook_secret` | `FRONT_WEBHOOK_SECRET` |
| `front_api_token` | `FRONT_API_TOKEN` |
| `vercel_webhook_secret` | `VERCEL_WEBHOOK_SECRET` |
| `joelclaw_webhook_secret` | `JOELCLAW_WEBHOOK_SECRET` |
| `revalidation_secret` | `REVALIDATION_SECRET` |
### Talon Key Paths
| What | Path |
|------|------|
| Binary | `~/.local/bin/talon` |
| Source | `~/Code/joelhooks/joelclaw/infra/talon/src/` |
| LaunchAgent plist | `~/Library/LaunchAgents/com.joel.talon.plist` |
| Logs | `~/.local/log/talon.log` / `talon.err` |
| ADR | `~/Vault/docs/decisions/0159-talon-worker-manager.md` |
### Gotcha: `codesign -fs -` is required
After `cargo build`, the binary has adhoc linker-signed signature. macOS launchd may SIGKILL:9 it. Re-signing with `codesign -fs -` fixes this.
## Common Gotchas
| Problem | Cause | Fix |
|---------|-------|-----|
| `exec format error` in pod | Built for amd64, not arm64 | Rebuild with `--platform linux/arm64` |
| GHCR push fails with `403 Forbidden` on blob HEAD | `gh auth token` missing package scopes | Use `ghcr_pat` via `agent-secrets` or export `GHCR_TOKEN` with package scope |
| `docker-credential-desktop` error | Docker config has credsStore | Script uses temp config dir — if manual, remove `"credsStore": "desktop"` |
| Function missing after deploy | Not in index file | Add to both `index.host.ts` AND `index.cluster.ts` |
| Function still missing | Stale Inngest registration | `joelclaw refresh` then check again |
| "Unable to reach SDK URL" | Worker pod not ready | Wait for rollout, then `joelclaw refresh` |
| Runs stuck after deploy | `retries: 0` on the function | Set `retries: 2` minimum (ADR-0156) |
| Stale app registrations | Multiple apps registered | Delete old registrations in Inngest dashboard (`:8289`) |
## Key Paths
| What | Path |
|------|------|
| Publish script | `k8s/publish-system-bus-worker.sh` |
| Dockerfile | `packages/system-bus/Dockerfile` |
| k8s manifest | `k8s/system-bus-worker.yaml` |
| Host function index | `packages/system-bus/src/inngest/functions/index.host.ts` |
| Cluster function index | `packages/system-bus/src/inngest/functions/index.cluster.ts` |
| Worker entry | `packages/system-bus/src/serve.ts` |
| GH Actions workflow | `.github/workflows/system-bus-worker-deploy.yml` |
| ADR-0156 | `~/Vault/docs/decisions/0156-graceful-worker-restart.md` |
More from joelhooks/joelclaw
- add-skillCreate new joelclaw skills with the idiomatic process — repo-canonical, symlinked, git-tracked, slogged. Triggers on 'add a skill', 'create skill', 'new skill', 'canonical skill', 'make a skill for', or any request to formalize a process or domain into a reusable skill.
- adr-skillCreate and maintain Architecture Decision Records (ADRs) optimized for agentic coding workflows. Use when you need to propose, write, update, accept/reject, deprecate, or supersede an ADR; bootstrap an adr folder and index; consult existing ADRs before implementing changes; or enforce ADR conventions. This skill uses Socratic questioning to capture intent before drafting, and validates output against an agent-readiness checklist.
- agent-discovery"Optimize websites, docs, and product surfaces for agent discoverability and operator UX. Use when working on agent SEO/AEO/GEO, crawl policy, markdown or JSON projections, llms.txt, sitemap.md, AGENTS.md guidance, content negotiation, accessibility for browser agents, or any request to make a site easier for pi, OpenCode, Claude Code, ChatGPT, Perplexity, or other agent harnesses to find and use."
- agent-loopStart, monitor, and cancel durable multi-agent coding loops via Inngest. Use when the user wants to run autonomous coding workloads, execute a PRD with multiple stories, kick off an AFK coding session, have agents implement features from a plan, or manage running loops. Triggers on "start a coding loop", "run this PRD", "implement these stories", "go AFK and code this", "check loop status", "cancel the loop", "joelclaw loop", or any request for autonomous multi-story code execution.
- agent-mail>-
- agent-workloads"Compatibility alias for the canonical `workflow-rig` front door. Use when older prompts mention `agent-workloads` or when you need the legacy workload-planning guidance; for new work, load `workflow-rig` first."
- clawmail>-
- cli-design"Design and build agent-first CLIs with HATEOAS JSON responses, context-protecting output, and self-documenting command trees. Use when creating new CLI tools, adding commands to existing CLIs (joelclaw, slog), or reviewing CLI design for agent-friendliness. Triggers on 'build a CLI', 'add a command', 'CLI design', 'agent-friendly output', or any task involving command-line tool creation."
- codex-prompting"Use this skill for any request to trigger, coordinate, or craft prompts for Codex. Use when user says 'send to codex', 'use codex', 'prompt codex', 'ask codex', 'delegate to codex', 'run in codex', or asks for a Codex-first execution handoff."
- content-publish"Publish content to joelclaw.com via the Convex-first pipeline. Covers the full lifecycle: draft → review → publish → revalidate → verify. Handles secret leasing, tag conventions, content types (article, tutorial, note, essay), and verification gates. Use when: 'write article about X', 'publish article <slug>', 'draft a tutorial', 'publish this', 'push to convex', or any content publishing task."