expect

$npx mdskill add yonatangross/orchestkit/expect

Analyzes git diffs to generate and execute targeted browser tests for UI changes, ensuring code modifications don't break user-facing experiences.

  • Helps developers test UI changes, verify PRs before merge, and run regression checks on modified components.
  • Integrates with git for diff analysis and depends on the agent-browser skill for test execution.
  • Decides test scope by reading git diffs and mapping changes to affected pages via a route map.
  • Presents results through pass/fail reporting after executing the generated test plan in the browser.

SKILL.md

.github/skills/expectView on GitHub ↗
---
name: expect
license: MIT
compatibility: "Claude Code 2.1.84+. Requires agent-browser skill."
description: "Diff-aware AI browser testing — analyzes git changes, generates targeted test plans, and executes them via agent-browser. Reads git diff to determine what changed, maps changes to affected pages via route map, generates a test plan scoped to the diff, and runs it with pass/fail reporting. Use when testing UI changes, verifying PRs before merge, running regression checks on changed components, or validating that recent code changes don't break the user-facing experience."
argument-hint: "[-m <instruction>] [--target unstaged|branch|commit] [--flow <slug>] [-y]"
context: fork
version: 1.0.0
author: OrchestKit
tags: [testing, browser, e2e, diff-aware, regression, visual, accessibility, ai-testing]
user-invocable: true
allowed-tools: [AskUserQuestion, Bash, Read, Write, Edit, Grep, Glob, Agent, TaskCreate, TaskUpdate, TaskList, ToolSearch, WebFetch]
skills: [testing-e2e, chain-patterns, memory]
complexity: high
persuasion-type: guidance
effort: high
model: sonnet
metadata:
  category: testing
  milestone: M99
triggers:
  keywords: [expect, "test my changes", "browser test", "diff test", "test what I changed", "test the UI", "visual regression", "check my changes"]
  examples:
    - "test my changes before I push"
    - "expect — run browser tests on what I changed"
    - "test the login flow after my auth refactor"
    - "run visual regression on the dashboard"
  anti-triggers: [cover, "unit test", "generate tests", verify, implement, "npm test"]
paths: [".expect/**", "**/*.test.{ts,tsx}", "playwright.config.*"]
invocation_hooks:
  - "command -v agent-browser >/dev/null 2>&1 || echo 'Warning: agent-browser not installed — run npm install -g @anthropic-ai/agent-browser'"
---

# Expect — Diff-Aware AI Browser Testing

Analyze git changes, generate targeted test plans, and execute them via AI-driven browser automation.

> **Note:** If `disableSkillShellExecution` is enabled (CC 2.1.91), the agent-browser install check won't run. Verify it's installed: `npx agent-browser --version`.

```bash
/ork:expect                              # Auto-detect changes, test affected pages
/ork:expect -m "test the checkout flow"  # Specific instruction
/ork:expect --flow login                 # Replay a saved test flow
/ork:expect --target branch              # Test all changes on current branch vs main
/ork:expect -y                           # Skip plan review, run immediately
```

**Core principle:** Only test what changed. Git diff drives scope — no wasted cycles on unaffected pages.


## Argument Resolution

```python
ARGS = "[-m <instruction>] [--target unstaged|branch|commit] [--flow <slug>] [-y]"

# Parse from full argument string
import re
raw = ""  # Full argument string from CC

INSTRUCTION = None
TARGET = "unstaged"  # Default: test unstaged changes
FLOW = None
SKIP_REVIEW = False

# Extract -m "instruction"
m_match = re.search(r'-m\s+["\']([^"\']+)["\']|-m\s+(\S+)', raw)
if m_match:
    INSTRUCTION = m_match.group(1) or m_match.group(2)

# Extract --target
t_match = re.search(r'--target\s+(unstaged|branch|commit)', raw)
if t_match:
    TARGET = t_match.group(1)

# Extract --flow
f_match = re.search(r'--flow\s+(\S+)', raw)
if f_match:
    FLOW = f_match.group(1)

# Extract -y
if '-y' in raw.split():
    SKIP_REVIEW = True
```


## STEP 0: MCP Probe + Prerequisite Check

```python
ToolSearch(query="select:mcp__memory__search_nodes")

# Verify agent-browser is available
Bash("command -v agent-browser || npx agent-browser --version")
# If missing: "Install agent-browser: npm i -g @anthropic-ai/agent-browser"
```


## CRITICAL: Task Management

```python
# 1. Create main task IMMEDIATELY
TaskCreate(
  subject="Expect: test changed code",
  description="Diff-aware browser testing pipeline",
  activeForm="Running diff-aware browser tests"
)

# 2. Create subtasks for each pipeline phase
TaskCreate(subject="Check fingerprint (skip if unchanged)", activeForm="Checking fingerprint")  # id=2
TaskCreate(subject="Scan git diff and classify changes", activeForm="Scanning diff")            # id=3
TaskCreate(subject="Map changes to routes/URLs", activeForm="Mapping routes")                   # id=4
TaskCreate(subject="Generate AI test plan", activeForm="Generating test plan")                   # id=5
TaskCreate(subject="Execute tests via agent-browser", activeForm="Executing browser tests")     # id=6
TaskCreate(subject="Compile test report", activeForm="Compiling report")                        # id=7

# 3. Set dependencies for sequential phases
TaskUpdate(taskId="3", addBlockedBy=["2"])  # Diff scan needs fingerprint check
TaskUpdate(taskId="4", addBlockedBy=["3"])  # Route map needs diff results
TaskUpdate(taskId="5", addBlockedBy=["4"])  # Test plan needs route map
TaskUpdate(taskId="6", addBlockedBy=["5"])  # Execution needs test plan
TaskUpdate(taskId="7", addBlockedBy=["6"])  # Report needs execution results

# 4. Before starting each task, verify it's unblocked
task = TaskGet(taskId="2")  # Verify blockedBy is empty

# 5. Update status as you progress
TaskUpdate(taskId="2", status="in_progress")  # When starting
TaskUpdate(taskId="2", status="completed")    # When done — repeat for each subtask
```


## Pipeline Overview

```
Git Diff → Route Map → Fingerprint Check → Test Plan → Execute → Report
```

| Phase | What | Output | Reference |
|-------|------|--------|-----------|
| **1. Fingerprint** | SHA-256 hash of changed files | Skip if unchanged since last run | `references/fingerprint.md` |
| **2. Diff Scan** | Parse git diff, classify changes | ChangesFor data (files, components, routes) | `references/diff-scanner.md` |
| **3. Route Map** | Map changed files to affected pages/URLs | Scoped page list | `references/route-map.md` |
| **4. Test Plan** | Generate AI test plan from diff + route map | Markdown test plan with steps | `references/test-plan.md` |
| **5. Execute** | Run test plan via agent-browser | Pass/fail per step, screenshots | `references/execution.md` |
| **6. Report** | Aggregate results, artifacts, exit code | Structured report + artifacts | `references/report.md` |


## Phase 1: Fingerprint Check

Check if the current changes have already been tested:

```python
Read(".expect/fingerprints.json")  # Previous run hashes
# Compare SHA-256 of changed files against stored fingerprints
# If match: "No changes since last test run. Use --force to re-run."
# If no match or --force: continue to Phase 2
```

Load: `Read("${CLAUDE_SKILL_DIR}/references/fingerprint.md")`


## Phase 2: Diff Scan

Analyze git changes based on `--target`:

```python
if TARGET == "unstaged":
    diff = Bash("git diff")
    files = Bash("git diff --name-only")
elif TARGET == "branch":
    diff = Bash("git diff main...HEAD")
    files = Bash("git diff main...HEAD --name-only")
elif TARGET == "commit":
    diff = Bash("git diff HEAD~1")
    files = Bash("git diff HEAD~1 --name-only")
```

Classify each changed file into 3 levels:
1. **Direct** — the file itself changed
2. **Imported** — a file that imports the changed file
3. **Routed** — the page/route that renders the changed component

Load: `Read("${CLAUDE_SKILL_DIR}/references/diff-scanner.md")`


## Phase 3: Route Map

Map changed files to testable URLs using `.expect/config.yaml`:

```yaml
# .expect/config.yaml
base_url: http://localhost:3000
route_map:
  "src/components/Header.tsx": ["/", "/about", "/pricing"]
  "src/app/auth/**": ["/login", "/signup", "/forgot-password"]
  "src/app/dashboard/**": ["/dashboard"]
```

If no route map exists, infer from Next.js App Router / Pages Router conventions.

Load: `Read("${CLAUDE_SKILL_DIR}/references/route-map.md")`


## Phase 4: Test Plan Generation

Build an AI test plan scoped to the diff, using the scope strategy for the current target:

```python
scope_strategy = get_scope_strategy(TARGET)  # See references/scope-strategy.md

prompt = f"""
{scope_strategy}

Changes: {diff_summary}
Affected pages: {affected_urls}
Instruction: {INSTRUCTION or "Test that the changes work correctly"}

Generate a test plan with:
1. Page-level checks (loads, no console errors, correct content)
2. Interaction tests (forms, buttons, navigation affected by the diff)
3. Visual regression (compare ARIA snapshots if saved)
4. Accessibility (axe-core scan on affected pages)
"""
```

If `--flow` specified, load saved flow from `.expect/flows/{slug}.yaml` instead of generating.

If NOT `--y`, present plan to user via `AskUserQuestion` for review before executing.

Load: `Read("${CLAUDE_SKILL_DIR}/references/test-plan.md")`


## Phase 5: Execution

Run the test plan via `agent-browser`:

```python
Agent(
  subagent_type="expect-agent",
  prompt=f"""Execute this test plan:
  {test_plan}

  For each step:
  1. Navigate to the URL
  2. Execute the test action
  3. Take a screenshot on failure
  4. Report PASS/FAIL with evidence
  """,
  run_in_background=True,
  model="sonnet",
  max_turns=50
)
```

Load: `Read("${CLAUDE_SKILL_DIR}/references/execution.md")`


## Phase 6: Report

```
/ork:expect Report
═══════════════════════════════════════
Target: unstaged (3 files changed)
Pages tested: 4
Duration: 45s

Results:
  ✓ /login — form renders, submit works
  ✓ /signup — validation triggers on empty fields
  ✗ /dashboard — chart component crashes (TypeError)
  ✓ /settings — preferences save correctly

3 passed, 1 failed

Artifacts:
  .expect/reports/2026-03-26T16-30-00.json
  .expect/screenshots/dashboard-error.png
```

Load: `Read("${CLAUDE_SKILL_DIR}/references/report.md")`


## Saved Flows

Reusable test sequences stored in `.expect/flows/`:

```yaml
# .expect/flows/login.yaml
name: Login Flow
steps:
  - navigate: /login
  - fill: { selector: "#email", value: "test@example.com" }
  - fill: { selector: "#password", value: "password123" }
  - click: button[type="submit"]
  - assert: { url: "/dashboard" }
  - assert: { text: "Welcome back" }
```

Run with: `/ork:expect --flow login`


## When NOT to Use

- **Unit tests** — use `/ork:cover` instead
- **API-only changes** — no browser UI to test
- **Generated files** — skip build artifacts, lock files
- **Docs-only changes** — unless you want to verify docs site rendering


## Related Skills

- `agent-browser` — Browser automation engine (required dependency)
- `ork:cover` — Test suite generation (unit/integration/e2e)
- `ork:verify` — Grade existing test quality
- `testing-e2e` — Playwright patterns and best practices


## References

Load on demand with `Read("${CLAUDE_SKILL_DIR}/references/<file>")`:

| File | Content |
|------|---------|
| `fingerprint.md` | SHA-256 gating logic |
| `diff-scanner.md` | Git diff parsing + 3-level classification |
| `route-map.md` | File-to-URL mapping conventions |
| `test-plan.md` | AI test plan generation prompt templates |
| `execution.md` | agent-browser orchestration patterns |
| `report.md` | Report format + artifact storage |
| `config-schema.md` | .expect/config.yaml full schema |
| `aria-diffing.md` | ARIA snapshot comparison for semantic diffing |
| `scope-strategy.md` | Test depth strategy per target mode |
| `saved-flows.md` | Markdown+YAML flow format, adaptive replay |
| `rrweb-recording.md` | rrweb DOM replay integration |
| `human-review.md` | AskUserQuestion plan review gate |
| `ci-integration.md` | GitHub Actions workflow + pre-push hooks |
| `research.md` | millionco/expect architecture analysis |


**Version:** 1.0.0 (March 2026) — Initial scaffold, M99 milestone

More from yonatangross/orchestkit

SkillDescription
agent-orchestrationAgent orchestration patterns for agentic loops, multi-agent coordination, alternative frameworks, and multi-scenario workflows. Use when building autonomous agent loops, coordinating multiple agents, evaluating CrewAI/AutoGen/Swarm, or orchestrating complex multi-step scenarios.
ai-ui-generationAI-assisted UI generation patterns for json-render, v0, Bolt, and Cursor workflows. Covers prompt engineering for component generation, review checklists for AI-generated code, design token injection, refactoring for design system conformance, and CI gates for quality assurance. Use when generating UI components with AI tools, rendering multi-surface MCP visual output, reviewing AI-generated code, or integrating AI output into design systems.
analyticsQuery cross-project usage analytics. Use when reviewing agent, skill, hook, or team performance across OrchestKit projects. Also replay sessions, estimate costs, and view model delegation trends.
animation-motion-designAnimation and motion design patterns using Motion library (formerly Framer Motion) and View Transitions API. Use when implementing component animations, page transitions, micro-interactions, gesture-driven UIs, or ensuring motion accessibility with prefers-reduced-motion.
architecture-patternsArchitecture validation and patterns for clean architecture, backend structure enforcement, project structure validation, test standards, and context-aware sizing. Use when designing system boundaries, enforcing layered architecture, validating project structure, defining test standards, or choosing the right architecture tier for project scope.
ascii-visualizerASCII diagram patterns for architecture, workflows, file trees, and data visualizations. Use when creating terminal-rendered diagrams, box-drawing layouts, progress bars, swimlanes, or blast radius visualizations.
assessAssesses and rates quality 0-10 with pros/cons analysis. Use when evaluating code, designs, or approaches.
async-jobsAsync job processing patterns for background tasks, Celery workflows, task scheduling, retry strategies, and distributed task execution. Use when implementing background job processing, task queues, or scheduled task systems.
audit-fullFull-codebase audit using 1M context window. Security, architecture, and dependency analysis in a single pass. Use when you need whole-project analysis.
audit-skillsAudits all OrchestKit skills for quality, completeness, and compliance with authoring standards. Use when checking skill health, before releases, or after bulk skill edits to surface SKILL.md files that are too long, have missing frontmatter, lack rules/references, or are unregistered in manifests.