channel-debug-core

Name: channel-debug-core
Author: vercel-labs/vercel-openclaw
$npx mdskill add vercel-labs/vercel-openclaw/channel-debug-core
Prove deployment state and gather evidence before proposing fixes.
Diagnoses delivery failures across Slack, Telegram, Discord, and WhatsApp.
Integrates Vercel logs, sandbox runtime, and workflow state tools.
Launches parallel diagnostic lanes to collect direct evidence first.
Delivers verified path diagrams and hypothesis tables for handoff.
SKILL.md
.github/skills/channel-debug-coreView on GitHub ↗
---
name: channel-debug-core
description: "Channel webhook triage for vercel-openclaw Slack/Telegram/Discord/WhatsApp issues: prove deployment state, collect admin readiness endpoints, build evidence-first handoff before fixes."
---

# Channel Debug Core

Use this skill for any Slack, Telegram, Discord, or WhatsApp delivery issue.

## Non-Negotiables

Before proposing a fix, produce:

1. Deployment-state proof.
2. Runtime path diagram with each edge marked `unknown`, `verified-good`, or `verified-bad`.
3. Hypothesis table with fastest falsifier and status.
4. Channel Specialist Handoff.

Do not write "most likely" twice. Gather direct evidence.

## Parallel First Rule

For live deployments, channel debugging starts with parallel lanes. The parent agent creates one artifact root and launches these lanes before any patch proposal:

| Lane | Agent role | Evidence focus |
|---|---|---|
| Vercel/app logs | `channel_vercel_logs` | Project targeting proof, deployment proof, admin readiness JSON, Vercel runtime logs, request/delivery correlation |
| Sandbox runtime | `channel_sandbox_runtime` | `npx sandbox` or app admin SSH fallback, process/port/config/log evidence |
| Workflow state | `channel_workflow_state` | Project targeting proof, `npx workflow inspect runs --backend vercel`, drain workflow state, fallback log correlation |
| Channel specialist | `channel_<channel>` | Channel route semantics, signatures/raw body, dedup, boot message, terminal-path audit |

No fix may be proposed until the merge gate is satisfied:

1. Each lane has returned a handoff, or the lane is marked unavailable with the attempted fallback.
2. At least one runtime edge is `verified-bad`.
3. Contradictory lane evidence has been added to the hypothesis table.
4. Prior known fixes have been compared against current evidence.

## Project Targeting Preflight

Before any `vercel`, `npx sandbox`, or `npx workflow` investigation, prove the command is targeting the intended Vercel project and team.

Always inspect `.vercel/project.json` and compare it to the incident target. If they differ, do not rely on inferred project context. Use explicit project/team/deployment targeting and record the mismatch in the handoff.

Known sharp edge: this repo may be linked to release 45 in `.vercel/project.json` (`projectName: vercel-openclaw-45`, `projectId: prj_ag6Uzj9b4deNK98jVS5SO0H2Shmx`). For release 46, use explicit target context (`projectName: vercel-openclaw-46`, `projectId: prj_JSzrXyJiMzT6F7BUA76Naa7nkjRa`, team `vercel-internal-playground`) rather than trusting `.vercel` inference.

If a tool prints that it inferred the project from `.vercel`, treat the output as suspect until the inferred project is proven to match the incident target. This is especially important for `npx workflow`, which can silently inspect the wrong project when run from a repo linked to a different release.

## Required Admin Surfaces

Collect these first:

```bash
GET /api/admin/why-not-ready
GET /api/channels/summary
GET /api/admin/sandbox-diag
GET /api/admin/logs
```

Save the raw JSON before continuing. Logs are a ring buffer and can evict important events.

## Readiness Triad

Report these separately:

- `route-ready`: platform route/native route appears registered.
- `native-accepted`: `lastForward.ok` true with `classification:"accepted"`, or equivalent handler acceptance evidence.
- `user-visible-reply`: real user saw a Slack/Telegram/Discord/WhatsApp reply or status update.

Never collapse these into `connected`.

## Deployment Proof

Run or request equivalent evidence:

```bash
git rev-parse HEAD
git ls-remote origin main
curl "$URL/api/admin/sandbox-diag"
```

If the deployed runtime cannot be tied to the source being read, say so and stop code-level conclusions.

## Evidence Artifact Rule

Write evidence under:

```text
.agent-runs/channel-debug/<timestamp>/<channel>/
```

Do not commit runtime evidence. Redact admin secrets, bypass secrets, bot tokens, webhook secrets, and platform access tokens.

## Release Env Files

When the incident references a release-specific env file, such as `.env.45`, use it only to investigate that release's live run.

Allowed uses:

- deployment URL / app URL
- Vercel project/team/deployment context
- Vercel token for logs/workflow/sandbox CLI
- `ADMIN_SECRET`
- Deployment Protection bypass
- platform credentials required to verify the specific channel run

Rules:

- Source the file; do not print it.
- Do not copy it into artifacts.
- Do not include raw values in handoffs.
- Record only the file name and variable names used.
- Treat env files as credential/context input, not deployment proof. Deployment proof still comes from git, project targeting proof, and live runtime signals.

## Sandbox Runtime Probe

The sandbox lane must first identify the actual Vercel Sandbox runtime ID. Use `/api/admin/sandbox-diag` and `/api/channels/summary` before CLI access.

If the ID is `oc-prj*`, `prj_*`, `OPENCLAW_INSTANCE_ID`, or `VERCEL_PROJECT_ID`, do not pass it as `<sandbox_id>` to `sandbox exec` or `sandbox connect`.

When `npx sandbox` / `sandbox` works, collect read-only evidence:

```bash
sandbox exec "$SANDBOX_ID" sh -lc '
  set -eu
  echo "== process =="
  ps -eo pid,ppid,comm,args | grep -E "[o]penclaw|[n]ode" || true
  echo "== ports =="
  (ss -ltnp || netstat -ltnp) 2>/dev/null | grep -E ":(3000|8787)\\b" || true
  echo "== openclaw logs =="
  tail -200 /tmp/openclaw/openclaw-*.log 2>/dev/null || true
'
```

For config, print only shape, never secrets:

```bash
sandbox exec "$SANDBOX_ID" node - <<'NODE'
const fs = require("fs");
const p = "/home/vercel-sandbox/.openclaw/openclaw.json";
const j = JSON.parse(fs.readFileSync(p, "utf8"));
console.log(JSON.stringify({
  channels: Object.keys(j.channels || {}),
  pluginAllow: j.plugins?.allow || null,
  hasSlack: Boolean(j.channels?.slack),
  hasTelegram: Boolean(j.channels?.telegram),
  hasDiscord: Boolean(j.channels?.discord),
  hasWhatsApp: Boolean(j.channels?.whatsapp),
}, null, 2));
NODE
```

If CLI access fails because the ID is incompatible or auth cannot resolve the sandbox, save the CLI failure and switch to the app admin SSH/exec fallback. The fallback must run the same read-only probes.

## Workflow State Probe

The workflow lane must inspect Vercel Workflow state before treating the workflow path as healthy or broken:

```bash
npx workflow inspect runs --backend vercel > "$ART/workflow/runs.txt" 2> "$ART/workflow/runs.stderr.txt"
```

Correlate by channel, incident time, request ID, delivery ID, platform event ID, and `drainChannelWorkflow`.

Collect workflow run ID, status, current/last step, retry count or repeated step evidence, terminal error, whether native forward attempts happened, and whether `channels.forward_outcome` was written.

If `npx workflow` is unavailable or cannot inspect Vercel state, save the failure and fall back to Vercel logs filtered to workflow/channel events, `/api/admin/logs`, and `/api/channels/summary.lastForward`. Do not use local workflow data as evidence for production behavior.

## Prior Fix Comparison

Before proposing a patch, compare current evidence to known prior fixes and failure signatures:

| Prior fix / failure mode | Current matching evidence | Current contradicting evidence | Match? |
|---|---|---|---|
| openclaw-42 zero-plugin gateway wedge | | | |
| stale sandbox public URL / sandbox-not-listening | | | |
| workflow retry exhaustion | | | |
| route-ready vs native-accepted mismatch | | | |
| recent accepted-forward override of stale configSync failure | | | |
| channel-specific signature/raw-body/secret issue | | | |

A repeated operator-facing message is not enough. Match on runtime signals: status codes, route probes, process/log evidence, Workflow state, and `lastForward`.

## Fallback When Live Access Is Unavailable

If you cannot call the deployed endpoints, produce a static code-path audit and mark runtime edges `unknown`. Do not claim runtime behavior from code alone.