regression-search
$
npx mdskill add sonichi/sutando/regression-searchAutomates regression detection in phone-call data using keyword analysis
- Identifies when a feature stopped working by analyzing call transcripts
- Uses find-regression.py and diagnose-call.py scripts with call data from JSONL files
- Classifies calls as working or broken based on refusal/error patterns and timestamps
- Outputs sorted timelines, call metrics, and diagnostic snippets for investigation
SKILL.md
.github/skills/regression-searchView on GitHub ↗
---
name: regression-search
description: "Search phone-call history for when a feature regressed (find-regression.py) and drill into a single call to see what went wrong (diagnose-call.py). Skips reading 100+ transcripts by hand."
---
# Regression Search
Two scripts for hunting down bad calls without reading every transcript:
1. **`find-regression.py`** — search `results/calls/calls.jsonl` for calls touching a feature, classify each as working/broken, print a sorted timeline.
2. **`diagnose-call.py`** — drill into a single call by SID, report refusals/errors/silences/repeated requests, optionally show metrics from `data/call-metrics.jsonl`.
Closes [#188](https://github.com/sonichi/sutando/issues/188).
## When to use
- "When did the X feature stop working?" — pass the feature keyword.
- "Has feature Y improved?" — see the broken/working trend over time.
- Before shipping a fix — sanity check that the regression is reproducible.
## Usage
```bash
python3 skills/regression-search/scripts/find-regression.py "record"
python3 skills/regression-search/scripts/find-regression.py "summon" --since 2026-04-01
python3 skills/regression-search/scripts/find-regression.py "play" --json
```
Flags:
- `--since YYYY-MM-DD` — only show calls on/after this date
- `--json` — machine-readable output
- `--show-snippet` — print a one-line transcript snippet for each call
## Heuristics
A call is **broken** for a query if any of:
- Sutando refuses ("I can't", "I'm not able", "I'm unable", "sorry I cannot")
- Sutando reports an error ("error", "failed", "didn't work", "something went wrong")
- The user repeats the same request 2+ times in a row (Sutando didn't respond usefully)
- Sutando says "(Silence)" after the user mentions the feature
Otherwise the call is **working** if Sutando's response includes the feature keyword and isn't flagged broken.
These are intentionally crude — the goal is "good enough to find the regression window without reading 163 transcripts." Tune as you find false positives.
## Limitations
- Keyword matching only. "recording doesn't stop" vs "recording won't start" both match `record`. The issue calls this out as future work.
- No semantic understanding. A call where Sutando talks about recording but the user wanted something else still matches.
- Doesn't correlate with git commits — manual step for now.
## diagnose-call.py
```bash
python3 skills/regression-search/scripts/diagnose-call.py de1f04733fc2
python3 skills/regression-search/scripts/diagnose-call.py CA701fc4129779... --metrics
python3 skills/regression-search/scripts/diagnose-call.py de1f04733fc2 --json
```
Accepts a full SID or just the last 12 characters. Reports turn counts, refusals, errors, silences, repeated user requests, and the ending style (normal vs abrupt user end vs sutando silence). With `--metrics`, also pulls per-event tool-call timeline from `data/call-metrics.jsonl` (requires PR #223). Exit code 1 if any issues are found, 0 if clean — useful for CI.
Typical workflow: run `find-regression.py` to surface broken candidates, then `diagnose-call.py <sid>` to drill into the worst one.
## Future work
- Auto-correlate regression windows with git log
- Smarter NLP-based query matching (query: "recording doesn't stop" vs "recording won't start")
More from sonichi/sutando
- agent-registryLocal Agent Registry — a standalone, dependency-free service that tracks running Claude Code (and other) agent instances. Agents self-register on startup and heartbeat while alive; the Electron overlay and Sutando dashboard read the live list. Use when you need to know which coding agents are running, where, and since when.
- bot2bot-postPost a coordination message from this bot to the shared bot2bot channel, @-mentioning the other Sutando node.
- claude-codexBash wrapper around the local Codex CLI for non-interactive runs from inside Sutando (bridges, cron, scripts). For interactive code review or task hand-off from this Claude Code session, prefer the official `/codex:*` plugin commands; this skill is the file-bridge-compatible path that `discord-bridge.py` invokes for team-tier sandboxed delegation.
- claude-geminiUse the local Gemini CLI from Claude Code with the user's existing Gemini authentication or API configuration. Use for large-context repo scans, multimodal analysis, second-opinion planning, or structured Gemini runs in the current workspace.
- claude-routerChoose between the local Codex CLI and Gemini CLI from Claude Code. Use for automatic model selection when the user wants the best local delegate for code review, repo-wide analysis, planning, or implementation.
- cross-node-syncRsync-over-ssh sync between Sutando nodes (Mac Studio and MacBook) for shared memory + notes. Optional — core runs fine without it; enables automatic cross-bot learning and note propagation by running from the proactive-loop cron on each pass.
- deal-finderScan configured sources (Craigslist now; eBay + Facebook Marketplace planned) for used-item listings matching the owner's criteria. Currently configured for a Mac mini search (M2+, 16GB+, 512GB+, ≤$500, near 94566). Notify owner via SMS + Telegram on a match.
- electron-overlay-dimmingReusable pattern for focus-based auto-dimming of Electron overlay windows — when the app loses focus, all overlay windows fade to a low opacity; when an overlay regains focus, they return to their configured opacity. Use when building always-on-top Electron overlays that should recede while the user works in other apps.
- gemini-ttsRender text to mp3 via Google Gemini Flash TTS. Free-tier eligible (1500 req/day). Use for video narration, demo voiceovers, audio notes. Parallels openai-tts; default for make-viral-video.
- macos-toolsmacOS native integrations: screen capture, calendar, reminders, contacts, email (Mail.app), Spotlight search. Use when the user asks about their screen, schedule, to-do list, contacts, or wants to send email on macOS.