ai-test

$npx mdskill add arcasilesgroup/ai-engineering/ai-test

Enforce TDD and generate tests across multiple programming languages.

  • Executes test-first development cycles for new features.
  • Analyzes coverage gaps and missing edge cases in codebases.
  • Determines test strategy based on project language and requirements.
  • Delivers executable test files with immediate execution results.
SKILL.md
.github/skills/ai-testView on GitHub ↗
---
name: ai-test
description: Writes tests, enforces TDD (RED-GREEN-REFACTOR), analyzes coverage gaps, defines test strategy across Python, TypeScript, .NET, Rust, Go. Trigger for 'add tests for', 'write a test', 'I need 80 percent coverage', 'plan my test approach', 'TDD this'. Not for failing tests where the fix is unclear; use /ai-debug instead. Not for AI reliability over time; use /ai-reliability-eval instead.
effort: mid
argument-hint: "plan|run|gap|tdd [target]"
mode: agent
model_tier: sonnet
mirror_family: copilot-skills
generated_by: ai-eng sync
canonical_source: .claude/skills/ai-test/SKILL.md
edit_policy: generated-do-not-edit
---



# Test

## Purpose

TDD enforcement and testing skill. Tests are executable specifications -- they define what the system does before the system does it. Maximum confidence per minute of developer time.

## When to Use

- `tdd`: driving new features test-first (RED-GREEN-REFACTOR)
- `run`: writing and executing tests for existing code
- `gap`: analyzing coverage gaps and missing edge cases
- `plan`: designing test strategy before writing tests

## Process

Step 0 (load contexts): read `.ai-engineering/manifest.yml` `providers.stacks`; load `.ai-engineering/overrides/<stack>/conventions.md` for each stack and `.ai-engineering/overrides/_shared/conventions.md`; load `.ai-engineering/team/*.md` for team conventions.

### Mode: tdd (RED-GREEN-REFACTOR)

For TDD mode, follow `handlers/tdd.md` for the full RED-GREEN-REFACTOR flow.

### Mode: run

1. Detect test framework from project files
2. Follow existing conventions (directory structure, naming, fixtures)
3. Write tests using AAA pattern with descriptive names
4. Run with stack-appropriate command
5. Report results: pass/fail count, coverage delta

### Mode: gap

1. Run coverage tool with branch coverage enabled
2. Identify untested critical paths (business logic > glue code)
3. Check for missing edge cases: null, empty, boundary, error paths
4. Produce gap report with prioritized recommendations

### Mode: plan

1. Map the testing surface (modules, public APIs, critical paths)
2. Assign test categories: unit, integration, e2e
3. Define coverage targets per module
4. Identify infrastructure needs (test containers, fixtures, fakes)

## Stack Commands

| Stack | Runner | Coverage | Async |
|-------|--------|----------|-------|
| Python | `uv run pytest` | `pytest-cov` (branch=true) | `asyncio_mode = "auto"` |
| TypeScript | `vitest` or `jest` | `c8` / `istanbul` | `async/await` |
| .NET | `dotnet test` + xUnit | `coverlet` | `async Task` |
| Rust | `cargo test` | `cargo tarpaulin` | `#[tokio::test]` |
| Go | `go test ./...` | `go test -cover` | goroutine tests |

## Testing Rules

**Fakes over mocks**. Mocks test implementation details. Fakes implement the same interface.

Mocks are acceptable ONLY for:
1. Verifying something was NOT called
2. Simulating transient errors for retry logic
3. Third-party libraries (but wrap in your own adapter first)

**AAA pattern** (non-negotiable):

```python
# Arrange -- set up inputs and dependencies
# Act -- call the function under test
# Assert -- verify the outcome
```

**Name pattern**: `test_<unit>_<scenario>_<expected_outcome>`
- Good: `test_parse_email_rejects_missing_at_symbol`
- Bad: `test_parse_email`, `test_1`, `test_it_works`

## Anti-Patterns (Reject These)

| Anti-Pattern | Why It Fails |
|-------------|-------------|
| Testing the mock | Proves the mock works, not the code |
| No-op test (assert True) | Tests nothing, inflates coverage |
| Testing implementation | Breaks on refactor, proves nothing about behavior |
| Huge test setup | Design is too coupled -- simplify the interface |
| sleep() for sync | Flaky -- use events, barriers, wait_for |
| Exact float comparison | Flaky -- use approx/closeTo |

## Iron Law

If tests are wrong, escalate to the user. NEVER weaken, skip, or modify tests to make implementation easier -- tests are the contract; bending them hides bugs. "Tests are wrong" means the requirement changed -- not that passing them is hard.

## Common Mistakes

- Writing tests after implementation (tests-after prove what IS, not what SHOULD be)
- Testing private methods (test the public API)
- 100% coverage with meaningless assertions
- Skipping edge cases (null, empty, boundary, concurrent access)
- Not running ALL tests after changes

## Handlers

| Handler | File | Activation |
|---------|------|-----------|
| E2E Testing | `handlers/e2e.md` | Activated when `*.spec.ts`, `playwright.config.ts`, or `e2e/` directory detected |
| TDD Mode | `handlers/tdd.md` | Activated when `mode=tdd` |

## Examples

### Example 1 — TDD a new feature

User: "I'm building a JWT validator. Walk me through TDD."

```
/ai-test tdd jwt-validator
```

RED: writes failing tests for valid token, expired token, malformed signature. Confirms FAIL for the expected reason. GREEN: hands off to `ai-build` for minimal implementation. REFACTOR: stays green.

### Example 2 — coverage gap analysis

User: "where am I light on tests?"

```
/ai-test gap
```

Runs the stack-specific coverage tool, ranks files by coverage delta, suggests the highest-leverage test to add next.

## Integration

Called by: `/ai-build` (build tasks), `/ai-build` (TDD mode), user directly. Calls: stack-specific test runners. See also: `/ai-debug`, `/ai-verify`, `/ai-reliability-eval`.

$ARGUMENTS
More from arcasilesgroup/ai-engineering