skill-comply
$
npx mdskill add affaan-m/ECC/skill-complyVerify agent compliance against rules and skills automatically.
- Measures adherence to coding standards and agent definitions.
- Integrates with Claude CLI and stream-json for trace capture.
- Classifies tool calls using LLM logic rather than regex.
- Delivers self-contained reports with behavioral timelines.
SKILL.md
.github/skills/skill-complyView on GitHub ↗
--- name: skill-comply description: Visualize whether skills, rules, and agent definitions are actually followed — auto-generates scenarios at 3 prompt strictness levels, runs agents, classifies behavioral sequences, and reports compliance rates with full tool call timelines origin: ECC tools: Read, Bash --- # skill-comply: Automated Compliance Measurement Measures whether coding agents actually follow skills, rules, or agent definitions by: 1. Auto-generating expected behavioral sequences (specs) from any .md file 2. Auto-generating scenarios with decreasing prompt strictness (supportive → neutral → competing) 3. Running `claude -p` and capturing tool call traces via stream-json 4. Classifying tool calls against spec steps using LLM (not regex) 5. Checking temporal ordering deterministically 6. Generating self-contained reports with spec, prompts, and timelines ## Supported Targets - **Skills** (`skills/*/SKILL.md`): Workflow skills like search-first, TDD guides - **Rules** (`rules/common/*.md`): Mandatory rules like testing.md, security.md, git-workflow.md - **Agent definitions** (`agents/*.md`): Whether an agent gets invoked when expected (internal workflow verification not yet supported) ## When to Activate - User runs `/skill-comply <path>` - User asks "is this rule actually being followed?" - After adding new rules/skills, to verify agent compliance - Periodically as part of quality maintenance ## Usage ```bash # Full run uv run python -m scripts.run ~/.claude/rules/common/testing.md # Dry run (no cost, spec + scenarios only) uv run python -m scripts.run --dry-run ~/.claude/skills/search-first/SKILL.md # Custom models uv run python -m scripts.run --gen-model haiku --model sonnet <path> ``` ## Key Concept: Prompt Independence Measures whether a skill/rule is followed even when the prompt doesn't explicitly support it. ## Report Contents Reports are self-contained and include: 1. Expected behavioral sequence (auto-generated spec) 2. Scenario prompts (what was asked at each strictness level) 3. Compliance scores per scenario 4. Tool call timelines with LLM classification labels ### Advanced (optional) For users familiar with hooks, reports also include hook promotion recommendations for steps with low compliance. This is informational — the main value is the compliance visibility itself.
More from affaan-m/ECC
- accessibilityDesign, implement, and audit inclusive digital products using WCAG 2.2 Level AA
- agent-architecture-auditFull-stack diagnostic for agent and LLM applications. Audits the 12-layer agent stack for wrapper regression, memory pollution, tool discipline failures, hidden repair loops, and rendering corruption. Produces severity-ranked findings with code-first fixes. Essential for developers building agent applications, autonomous loops, or any LLM-powered feature.
- agent-evalHead-to-head comparison of coding agents (Claude Code, Aider, Codex, etc.) on custom tasks with pass rate, cost, time, and consistency metrics
- agent-harness-constructionDesign and optimize AI agent action spaces, tool definitions, and observation formatting for higher completion rates.
- agent-introspection-debuggingStructured self-debugging workflow for AI agent failures using capture, diagnosis, contained recovery, and introspection reports.
- agent-payment-x402Add x402 payment execution to AI agents with per-task budgets, spending controls, and non-custodial wallets. Supports Base through agentwallet-sdk and X Layer through OKX Payments / OKX Agent Payments Protocol.
- agent-sortBuild an evidence-backed ECC install plan for a specific repo by sorting skills, commands, rules, hooks, and extras into DAILY vs LIBRARY buckets using parallel repo-aware review passes. Use when ECC should be trimmed to what a project actually needs instead of loading the full bundle.
- agentic-engineeringOperate as an agentic engineer using eval-first execution, decomposition, and cost-aware model routing.
- agentic-osBuild persistent multi-agent operating systems on Claude Code. Covers kernel architecture, specialist agents, slash commands, file-based memory, scheduled automation, and state management without external databases.
- ai-first-engineeringEngineering operating model for teams where AI agents generate a large share of implementation output.