sandbox-lifecycle-debug
$
npx mdskill add vercel-labs/vercel-openclaw/sandbox-lifecycle-debugDiagnose sandbox lifecycle failures and recover state.
- Fixes broken state transitions, polling errors, and lifecycle recovery issues.
- Integrates with Vercel Sandbox v2, admin APIs, and gateway diagnostics.
- Analyzes status metadata, logs, and lock states to pinpoint root causes.
- Outputs structured reports on state, snapshots, and gateway readiness.
SKILL.md
.github/skills/sandbox-lifecycle-debugView on GitHub ↗
---
name: sandbox-lifecycle-debug
description: "Sandbox lifecycle debugging for vercel-openclaw: create, resume, stop, snapshotting, reset, stale-running reconciliation, persistent Sandbox v2 behavior, hot spares, and lifecycle locks. Use when sandbox state transitions, status polling, stop/resume, reset, or lifecycle recovery is wrong."
---
# Sandbox Lifecycle Debug
Use this skill when the sandbox state machine is the primary suspect.
For Sandbox v2 truth-model work, also load `sandbox-v2-lifecycle`. Official Vercel Sandbox v2 docs override older repo guidance that treats manual snapshots as the normal restore source.
## Start Here
Read `lat.md/sandbox-lifecycle.md` sections `Status State Machine`, `Triggers -- What Causes State Transitions`, and the specific trigger involved. Run `lat locate "Sandbox Lifecycle"` or `lat search "sandbox lifecycle <symptom>"` when unsure.
Collect before edits:
- `GET /api/status` and any UI state that triggered the action.
- `GET /api/admin/sandbox-diag`.
- `GET /api/admin/logs` filtered for `sandbox.`, `gateway.`, `watchdog.`, `proxy.`.
- Local `git rev-parse HEAD`, remote `git ls-remote origin main`, and live deployment proof.
## Split The State
Report these separately:
- metadata status in `SingleMeta.status`
- Vercel Sandbox SDK status
- gateway readiness on port 3000
- persistent auto-saved state availability
- manual snapshot/checkpoint availability when relevant
- lifecycle lock and start lock state when visible
- UI polling status
Do not use `running` as shorthand for gateway-ready or user-ready.
## Common Paths
- Admin ensure: `/api/admin/ensure` -> `ensureSandboxRunning()` / `ensureSandboxReady()`.
- Gateway request: auth -> `ensureSandboxRunning()` -> token refresh -> `touchRunningSandbox()` -> proxy.
- Stop/auto-save: `stopSandbox()` -> cleanup -> cron persistence -> `sandbox.stop({ blocking: false })` -> `snapshotting` host metadata while v2 persists state.
- Status polling: `GET /api/status` -> stale running or snapshotting reconciliation.
- Reset: `resetSandbox()` destroys active sandbox and snapshots, clears cron and token metadata.
## Sandbox v2 Rules
- Main OpenClaw sandbox is one named persistent sandbox.
- Normal resume uses the persistent name and auto-saved state, not manual `snapshotId`.
- Observation of stopped/snapshotting state must use `resume:false`.
- Manual `snapshot()` is explicit/debug/checkpoint only and shuts the sandbox down.
- Worker/debug sandboxes are short-lived and must use `persistent:false`.
## Fix Boundaries
- Primary: `src/server/sandbox/lifecycle.ts`, `src/server/sandbox/controller.ts`.
- Routes: `src/app/api/admin/{ensure,stop,snapshot,reset,status}/**` and `src/app/api/status/route.ts`.
- Tests: lifecycle and harness tests under `src/server/sandbox/**.test.ts` and `src/test-utils/harness`.
- Docs: `lat.md/sandbox-lifecycle.md`, `docs/lifecycle-and-restore.md`.
## Verification
Use the narrowest command that covers the path, then run the repo verifier when the change has broad lifecycle impact:
```bash
node scripts/verify.mjs --steps=test,typecheck
lat check
```
For live lifecycle incidents, include before/after `/api/status`, `/api/admin/sandbox-diag`, and relevant log events.
More from vercel-labs/vercel-openclaw
- admin-ui-debugAdmin UI and operator surface debugging for vercel-openclaw: command shell design, admin actions, request core, status panels, launch verification UI, channel readiness UI, and local read-only production-data workflows. Use when the root admin UI, controls, visual state, or operator copy is wrong.
- auth-store-debugAuth and store debugging for vercel-openclaw: admin-secret mode, Sign in with Vercel, session cookies, CSRF, LOCAL_READ_ONLY, Redis vs memory store, keyspace namespacing, and metadata shape migrations. Use when login, route authorization, Redis persistence, or metadata state is suspect.
- channel-debug-coreChannel webhook triage for vercel-openclaw Slack/Telegram/Discord/WhatsApp issues: prove deployment state, collect admin readiness endpoints, build evidence-first handoff before fixes.
- channel-forward-parityWebhook route parity audit for channel delivery changes: ensure terminal paths log, record lastForward, classify failures, and refresh stale sandbox port URLs.
- cron-watchdog-debugCron and watchdog debugging for vercel-openclaw: Vercel Cron auth, persisted OpenClaw jobs, cron wake keys, token refresh, restore oracle, hot spare, and watchdog reports. Use when scheduled OpenClaw jobs fail to wake or run, watchdog status is wrong, cron persistence is suspect, or /api/cron/watchdog behavior changes.
- discord-deliveryDiscord channel specialist workflow: debug interaction webhooks, Ed25519 signatures, deferred replies, workflow forwarding to /discord-webhook, integration reconcile, and token expiry.
- firewall-ai-gateway-debugFirewall and Vercel AI Gateway debugging for vercel-openclaw: network policy allowlists, OIDC token refresh, AI Gateway transform rules, firewall learning/enforcement, and sandbox.update networkPolicy calls. Use when model calls, egress, token refresh, or firewall policy application fails.
- gateway-proxy-debugGateway and proxy debugging for vercel-openclaw: /gateway routing, HTML injection, WebSocket rewrite, gateway-token handoff, waiting page, status heartbeat, sandbox port URL cache, and proxy auth. Use when the OpenClaw UI, WebSockets, gateway proxying, or waiting-page flow breaks.
- lat-md>-
- launch-verify-debugLaunch verification and remote smoke debugging for vercel-openclaw: preflight, queue ping, ensureRunning, chatCompletions, wakeFromSleep, restorePrepared, channelReadiness, NDJSON progress, and vclaw create readiness. Use when launch verification, vclaw create validation, or remote smoke checks fail.