flaky-test-debugger
$
npx mdskill add ClipboardHealth/core-utils/flaky-test-debuggerDiagnose and repair unstable tests across frameworks.
- Handles intermittent failures in Playwright, NestJS, React, and unit tests.
- Analyzes code patterns and error logs to identify root causes.
- Generates execution plans or applies fixes automatically based on mode.
- Outputs structured reports or modified code ready for integration.
SKILL.md
.github/skills/flaky-test-debuggerView on GitHub ↗
--- name: flaky-test-debugger description: Debug and fix flaky tests including Playwright E2E, NestJS service/integration, React component, and unit tests. Use this skill when investigating intermittent test failures, triaging flaky tests, or fixing test instability. --- Phases run in order. Skip a phase if you already have the information it produces. Phase 3 runs only in fix mode. ## Mode: plan vs fix This skill runs in one of two modes: - **Fix mode (default):** produce a plan, then apply it. - **Plan mode:** produce a plan and stop, for human review. Use plan mode when the user asks for a plan, an investigation, a triage report, or says "don't fix yet" / "just plan it". Otherwise default to fix mode. Both modes share the same diagnosis path; the plan is the artifact you hand to a reviewer (plan mode) or to yourself (fix mode) before editing code. ## Phase 1: Classify Test Type Determine the test type from the user's input before doing anything else. The type dictates the investigation path. | Type | Signals | | -------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **E2E (Playwright)** | `.spec.ts` file, mentions Playwright, has a GitHub Actions run URL with a `playwright-llm-report` artifact, browser-level errors | | **Service (NestJS integration)** | Spins up a NestJS app, uses `supertest` or similar HTTP testing, MongoDB/Redis connection errors, `*.service.spec.ts` or test descriptions mentioning "service test" | | **React component** | Uses `@testing-library/react`, `render()`, `screen.*`, `.test.tsx` file, React act() warnings | | **Unit** | Pure logic tests, `.test.ts` file, no app bootstrap or DOM, Jest/Vitest matchers on plain functions or classes | If the type is ambiguous, check the test file extension and imports to confirm. ## Phase 1b: Check for Existing Fixes Before investigating, check whether someone (or another agent) has already fixed this flake. 1. **Search open PRs with the `flaky-test-fix` label** that touch the failing test file or its surrounding code. Use GitHub search scoped to the repo: - Search PRs labeled `flaky-test-fix` for the test file name or test directory - Review the PR's changes to assess whether they address the same flake pattern with reasonable confidence — if so, stop and report it to the user rather than opening a duplicate fix - If the PR only partially addresses the flake or targets a different root cause, note it and proceed with investigation 2. **Check recent commits on `main`** that touch the failing test file or its surrounding code: - `git log --oneline -20 origin/main -- <test-file-path>` and also check the parent directory or related source files - Read the commit messages — if one clearly fixes the same flake pattern, stop and report it to the user If an existing fix is found, report: - The PR number/URL or commit hash - A brief summary of what it addresses - Whether it fully covers the current flake or only partially If no existing fix is found, proceed to Phase 2. ## Phase 2: Produce a plan Follow [`references/plan.md`](./references/plan.md). It walks investigation, diagnosis, evidence gathering, and the fix decision tree, and produces a structured plan with confidence score. If you are in plan mode, present the plan and stop here. ## Phase 3: Apply the plan (fix mode only) Follow [`references/fix.md`](./references/fix.md). It takes the plan from Phase 2, applies the proposed fix, searches for sibling anti-patterns, and verifies. PR creation is out of scope -- if the user later opens one (or invokes a PR-shipping skill), label it `flaky-test-fix`.
More from ClipboardHealth/core-utils
- adversarial-reviewPerform an adversarial review of proposed work. Use ONLY when the user explicitly types /adversarial-review. Never auto-trigger, even if the user mentions reviewing, questioning, or challenging their approach.
- clipboard-testingEnd-to-end testing playbook for Clipboard Health changes. Use when the user wants to verify, exercise, or set up test data for a backend or frontend change against a live environment — "test my change end-to-end", "verify this works in dev", "create a test workplace / worker / shift", "get a shift through to paid / invoiced", "prove the API change works". Defaults to the `development` AWS environment, API-first (cbh CLI tokens + curl). The skill knows enough to run the core happy-path flow (workplace → worker → shift → clock in/out → pay → invoice) autonomously; for anything else, it orients around the codebase and asks the user for missing directories.
- cognito-user-analysisUse when looking up Cognito user details by sub UUID, finding duplicate accounts sharing phone or email, analyzing which duplicates to keep vs delete, or fixing orphaned UNCONFIRMED signups. Symptoms include 403 Forbidden on login, multiple accounts for same phone, backend sync issues.
- datadog-investigateInvestigate production issues by querying Datadog logs, metrics, and APM traces, then correlating findings with the codebase. Use this skill whenever the user mentions production errors, Datadog, observability, log investigation, latency spikes, error rate increases, 500s, trace IDs, monitor alerts, or wants to debug any service issue in a deployed environment.
- interview-featureUse when clarifying requirements for a feature ticket. Iteratively researches and interviews the user until the problem is well-understood, then produces a structured problem brief. Dispatched by write-feature-ticket when context is insufficient.
- investigate-ticketUse when investigating a bug, incident, or issue before implementation. Researches codebase, queries Datadog, and presents structured findings with handoff options. Also use when asked to "look into" or "investigate" something.
- local-packageUse Clipboard's internal CLI to link and unlink @clipboard-health packages across repositories for local development. Use when testing local package changes, linking @clipboard-health packages between repos, or using the cbh CLI local-package command.
- seed-dataTrigger seed data generation for test environments via GitHub Actions. Use when asked to seed, create test data, or set up HCPs/facilities/shifts.
- write-bug-ticketUse when creating a Linear bug report ticket from conversation context, investigation findings, or user-provided evidence. Focuses on structuring and writing — not investigating.
- write-feature-ticketUse when creating a Linear feature request ticket from conversation context, a brief description, or code/PR analysis. Interviews the user for clarity when context is insufficient.