vercel-openclaw-sandbox-benchmarking
$
npx mdskill add vercel-labs/vercel-openclaw/vercel-openclaw-sandbox-benchmarkingMeasure and optimize Vercel Sandbox restore latency with precise benchmarks.
- Enables vCPU sweeps, runtime comparisons, and startup phase profiling.
- Integrates with @vercel/sandbox SDK and app HTTP endpoints.
- Executes direct SDK or API-based restore tests based on query intent.
- Outputs per-phase timing data for restore hot path optimization.
SKILL.md
.github/skills/vercel-openclaw-sandbox-benchmarkingView on GitHub ↗
---
name: vercel-openclaw-sandbox-benchmarking
description: Benchmark and optimize Vercel Sandbox restore speed for the vercel-openclaw project. Use when measuring restore latency, running vCPU sweeps, comparing runtime performance (Node vs Bun), profiling startup phases, or optimizing the restore hot path. Triggers on "benchmark", "restore speed", "sandbox performance", "optimize restore", "vCPU sweep".
---
# Sandbox Restore Benchmarking
Techniques and tools for measuring and optimizing sandbox restore speed in vercel-openclaw.
## Quick Start
```bash
# Refresh OIDC credentials (required for direct SDK access)
vercel env pull .env.local
# Run direct SDK benchmark (creates real sandbox, installs openclaw, snapshots, restores)
node scripts/bench-sandbox-direct.mjs --cycles=5 --vcpus=1
# Reuse existing snapshot (skip bootstrap)
node scripts/bench-sandbox-direct.mjs --cycles=5 --vcpus=1 --snapshot-id=<snap_id>
# Production stop/restore via app API
node scripts/benchmark-restore.mjs --base-url "$APP_URL" --cycles=3
```
## Two Benchmark Approaches
### 1. Direct SDK (`bench-sandbox-direct.mjs`)
Uses `@vercel/sandbox` SDK directly. Measures raw platform + app overhead without proxy/admin layer. Best for isolating restore performance.
### 2. App API (`benchmark-restore.mjs`)
Hits the deployed app's HTTP endpoints. Measures end-to-end including Vercel function cold starts, auth, and proxy. Best for real-world timing.
## Restore Phase Budget
Every restore records per-phase timings in `lastRestoreMetrics` (visible via `/api/status`):
| Phase | What it measures | Optimization lever |
|-------|-----------------|-------------------|
| `sandboxCreateMs` | `Sandbox.create()` from snapshot | Platform cost (not controllable) |
| `tokenWriteMs` | Credential file writes | Pass via `Sandbox.create({ env })` to eliminate |
| `assetSyncMs` | Config + static file writes | Manifest-based skip for static; env for dynamic |
| `startupScriptMs` | Fast-restore script (gateway + readiness) | Bun runtime, in-sandbox polling, force-pair deferral |
| `localReadyMs` | In-sandbox curl loop until `openclaw-app` | Part of startupScript; reported from script JSON |
| `firewallSyncMs` | Network policy application | Runs concurrently with boot |
| `publicReadyMs` | External reachability probe | Skipped for background restores |
| `bootOverlapMs` | Wall clock of Promise.all(boot, firewall, cred-write) | Concurrency ceiling |
## Key Optimization Techniques
### 1. In-sandbox readiness polling
Move curl loop inside the bash script instead of 120 separate `sandbox.runCommand("curl")` host-side calls. Eliminates per-attempt control-plane round-trips.
### 2. Defer force-pair after readiness
Gateway serves initial page without device pairing. Moving `node .force-pair.mjs` after readiness prevents CPU contention on 1 vCPU.
### 3. Pass credentials via env at create time
`Sandbox.create({ env: { OPENCLAW_GATEWAY_TOKEN, AI_GATEWAY_API_KEY } })` lets the startup script read tokens from env, eliminating blocking `writeFiles` round-trips (~6-9s).
### 4. Bun for gateway startup
Bun loads the 577MB/10K-file openclaw package ~33% faster than Node.js v22. Install during bootstrap, snapshot it, use in fast-restore script with Node.js fallback.
### 5. Manifest-based static asset skip
SHA-256 manifest in snapshot lets unchanged static files skip `writeFiles` on repeat restores.
## Profiling a Sandbox
To investigate startup bottlenecks inside a real sandbox:
```bash
# Create sandbox from snapshot and run commands
node -e "
const { Sandbox } = await import('@vercel/sandbox');
const sbx = await Sandbox.create({ source: { type: 'snapshot', snapshotId: '<id>' }, ports: [3000], timeout: 300000, resources: { vcpus: 1 } });
// Run profiling commands
const r = await sbx.runCommand('node', ['--version']);
console.log(await r.output());
await sbx.snapshot(); // cleanup
"
```
See `references/profiling-techniques.md` for specific profiling patterns.
## Decision Rules
- Pin `OPENCLAW_PACKAGE_SPEC` and `OPENCLAW_SANDBOX_VCPUS` during benchmarks
- Run 5+ cycles minimum for stable p50/p95
- Compare branches with same snapshot, same vCPU, same package version
- Ship only if `totalMs` p50 improves AND p95 doesn't regress
- Always verify with `node scripts/verify.mjs` before and after
## References
- `references/profiling-techniques.md` - In-sandbox profiling, Node vs Bun comparison, compile cache testing
- `references/benchmark-results.md` - Historical benchmark data and optimization journey
- `references/architecture.md` - Restore flow architecture and phase dependencies
More from vercel-labs/vercel-openclaw
- admin-ui-debugAdmin UI and operator surface debugging for vercel-openclaw: command shell design, admin actions, request core, status panels, launch verification UI, channel readiness UI, and local read-only production-data workflows. Use when the root admin UI, controls, visual state, or operator copy is wrong.
- auth-store-debugAuth and store debugging for vercel-openclaw: admin-secret mode, Sign in with Vercel, session cookies, CSRF, LOCAL_READ_ONLY, Redis vs memory store, keyspace namespacing, and metadata shape migrations. Use when login, route authorization, Redis persistence, or metadata state is suspect.
- channel-debug-coreChannel webhook triage for vercel-openclaw Slack/Telegram/Discord/WhatsApp issues: prove deployment state, collect admin readiness endpoints, build evidence-first handoff before fixes.
- channel-forward-parityWebhook route parity audit for channel delivery changes: ensure terminal paths log, record lastForward, classify failures, and refresh stale sandbox port URLs.
- cron-watchdog-debugCron and watchdog debugging for vercel-openclaw: Vercel Cron auth, persisted OpenClaw jobs, cron wake keys, token refresh, restore oracle, hot spare, and watchdog reports. Use when scheduled OpenClaw jobs fail to wake or run, watchdog status is wrong, cron persistence is suspect, or /api/cron/watchdog behavior changes.
- discord-deliveryDiscord channel specialist workflow: debug interaction webhooks, Ed25519 signatures, deferred replies, workflow forwarding to /discord-webhook, integration reconcile, and token expiry.
- firewall-ai-gateway-debugFirewall and Vercel AI Gateway debugging for vercel-openclaw: network policy allowlists, OIDC token refresh, AI Gateway transform rules, firewall learning/enforcement, and sandbox.update networkPolicy calls. Use when model calls, egress, token refresh, or firewall policy application fails.
- gateway-proxy-debugGateway and proxy debugging for vercel-openclaw: /gateway routing, HTML injection, WebSocket rewrite, gateway-token handoff, waiting page, status heartbeat, sandbox port URL cache, and proxy auth. Use when the OpenClaw UI, WebSockets, gateway proxying, or waiting-page flow breaks.
- lat-md>-
- launch-verify-debugLaunch verification and remote smoke debugging for vercel-openclaw: preflight, queue ping, ensureRunning, chatCompletions, wakeFromSleep, restorePrepared, channelReadiness, NDJSON progress, and vclaw create readiness. Use when launch verification, vclaw create validation, or remote smoke checks fail.