joelclaw-system-check

Name: joelclaw-system-check
Author: joelhooks/joelclaw

$npx mdskill add joelhooks/joelclaw/joelclaw-system-check

Runs a comprehensive health check of the joelclaw system, outputting a 1-10 score with per-component breakdown for diagnostics.

Helps diagnose system issues by checking 16 components like k8s, Redis, and tests.
Integrates with k8s cluster, Inngest, Redis, Typesense, and other internal tools.
Decides recommendations based on component statuses, scoring each from 1 to 10.
Presents results via a script output with detailed breakdowns and overall score.

SKILL.md

.github/skills/joelclaw-system-checkView on GitHub ↗

---
name: joelclaw-system-check
displayName: Joelclaw System Check
description: "Run a comprehensive health check of the joelclaw system — k8s cluster, worker, Inngest, Redis, Typesense/OTEL, tests, TypeScript, repo sync, memory pipeline, pi-tools, git config, active loops, disk, stale tests. Outputs a 1-10 score with per-component breakdown. Use when: 'system health', 'health check', 'is everything working', 'system status', 'how's the system', 'check everything', or at session start to orient."
version: 1.1.0
author: Joel Hooks
tags: [joelclaw, health, diagnostics, checks, operations]
---

# joelclaw System Health Check

Run `scripts/health.sh` for a full system health report with 1-10 score.

```bash
~/Code/joelhooks/joelclaw/skills/joelclaw-system-check/scripts/health.sh
```

## What It Checks (16 components)

| Check | What | Green (10) | Yellow (5-7) | Red (1-3) |
|-------|------|-----------|-------------|----------|
| k8s cluster | pods in `joelclaw` namespace | 4/4 Running, 0 restarts | partial pods | no pods |
| pds | AT Proto PDS on :2583 | version + collections | pod running, port-forward down | pod not running |
| worker | system-bus on :3111 | 16+ functions | responding, low count | down |
| inngest server | :8288 reachable | responding | — | down |
| redis/gateway | Redis + gateway session queues | connected, low pending queue | connected, backlog rising | unavailable |
| typesense/otel | Typesense health + OTEL query path | healthy + queryable | healthy, query degraded | unavailable |
| tests | `bun test` in system-bus | 0 fail | — | failures |
| tsc | `tsc --noEmit` | clean | — | type errors |
| repo sync | monorepo HEAD vs `origin/main` | in sync | ahead/behind | repo unavailable |
| memory pipeline | `joelclaw inngest memory-health` | healthy checks | degraded checks | failing checks |
| pi-tools | extension deps installed | all 3 deps | — | missing |
| git config | user.name + email set | set | — | missing |
| active loops | `joelclaw loop list` | queryable | query degraded | unavailable |
| gogcli | Google Workspace auth | account authed, token valid | token stored, no password | not configured |
| disk | free space + loop tmp | <80% used | — | >80% |
| stale tests | `__tests__/` + acceptance tests | clean | — | present |

## When to Run

- **Session start** — orient on system state before doing work
- **After loops complete** — verify nothing broke
- **After infra changes** — k8s, worker, Redis config
- **When something feels off** — quick triage

## Fixing Common Issues

**Repo drift**: `cd ~/Code/joelhooks/joelclaw && git fetch origin && git status -sb`

**pi-tools broken**: `cd ~/.pi/agent/git/github.com/joelhooks/pi-tools && bun add @sinclair/typebox @mariozechner/pi-coding-agent @mariozechner/pi-tui @mariozechner/pi-ai`

**PDS unreachable**: `kubectl port-forward -n joelclaw svc/bluesky-pds 2583:3000 &` (or if pod down: `kubectl rollout restart deployment/bluesky-pds -n joelclaw`)

**Worker down**: `joelclaw inngest restart-worker --register`

**Stale tests**: `rm -rf ~/Code/joelhooks/joelclaw/packages/system-bus/__tests__/ && find ~/Code/joelhooks/joelclaw/packages/system-bus/src -name "*.acceptance.test.ts" -delete`

**Loop tmp bloat**: `rm -rf /tmp/agent-loop/loop-*/` (only when no loops are running)

## Inngest Hung-Run Quick Triage

When a run appears stuck after first step:

```bash
joelclaw run <run-id>
```

If trace shows `Finalization` failure with `"Unable to reach SDK URL"`:

1. Verify registration/health:
`joelclaw inngest status`

2. Verify function is present where expected:
`joelclaw functions | rg -i "manifest-archive|<function-name>"`

3. Check for stale app registrations in Inngest UI/API and remove stale SDK URLs.

4. Assume possible handler blocking (not just network):
review recent step code for filesystem/Redis/subprocess blocking before step response.

More from joelhooks/joelclaw

Skill	Description
add-skill	Create new joelclaw skills with the idiomatic process — repo-canonical, symlinked, git-tracked, slogged. Triggers on 'add a skill', 'create skill', 'new skill', 'canonical skill', 'make a skill for', or any request to formalize a process or domain into a reusable skill.
adr-skill	Create and maintain Architecture Decision Records (ADRs) optimized for agentic coding workflows. Use when you need to propose, write, update, accept/reject, deprecate, or supersede an ADR; bootstrap an adr folder and index; consult existing ADRs before implementing changes; or enforce ADR conventions. This skill uses Socratic questioning to capture intent before drafting, and validates output against an agent-readiness checklist.
agent-discovery	"Optimize websites, docs, and product surfaces for agent discoverability and operator UX. Use when working on agent SEO/AEO/GEO, crawl policy, markdown or JSON projections, llms.txt, sitemap.md, AGENTS.md guidance, content negotiation, accessibility for browser agents, or any request to make a site easier for pi, OpenCode, Claude Code, ChatGPT, Perplexity, or other agent harnesses to find and use."
agent-loop	Start, monitor, and cancel durable multi-agent coding loops via Inngest. Use when the user wants to run autonomous coding workloads, execute a PRD with multiple stories, kick off an AFK coding session, have agents implement features from a plan, or manage running loops. Triggers on "start a coding loop", "run this PRD", "implement these stories", "go AFK and code this", "check loop status", "cancel the loop", "joelclaw loop", or any request for autonomous multi-story code execution.
agent-mail	>-
agent-workloads	"Compatibility alias for the canonical `workflow-rig` front door. Use when older prompts mention `agent-workloads` or when you need the legacy workload-planning guidance; for new work, load `workflow-rig` first."
clawmail	>-
cli-design	"Design and build agent-first CLIs with HATEOAS JSON responses, context-protecting output, and self-documenting command trees. Use when creating new CLI tools, adding commands to existing CLIs (joelclaw, slog), or reviewing CLI design for agent-friendliness. Triggers on 'build a CLI', 'add a command', 'CLI design', 'agent-friendly output', or any task involving command-line tool creation."
codex-prompting	"Use this skill for any request to trigger, coordinate, or craft prompts for Codex. Use when user says 'send to codex', 'use codex', 'prompt codex', 'ask codex', 'delegate to codex', 'run in codex', or asks for a Codex-first execution handoff."
content-publish	"Publish content to joelclaw.com via the Convex-first pipeline. Covers the full lifecycle: draft → review → publish → revalidate → verify. Handles secret leasing, tag conventions, content types (article, tutorial, note, essay), and verification gates. Use when: 'write article about X', 'publish article <slug>', 'draft a tutorial', 'publish this', 'push to convex', or any content publishing task."