skill-creator

$npx mdskill add nextlevelbuilder/goclaw/skill-creator

Build eval-driven GoClaw skills with human-in-the-loop iteration.

  • Creates practical task instructions instead of generic documentation.
  • Uses progressive disclosure across metadata, SKILL.md, and resources.
  • Tests skills against benchmarks to optimize descriptions and performance.
  • Registers new skills directly in the system database for activation.
SKILL.md
.github/skills/skill-creatorView on GitHub ↗
---
name: skill-creator
description: Create or update GoClaw agent skills with eval-driven iteration. Use for new skills, skill scripts, references, benchmark optimization, description optimization, eval testing, extending agent capabilities.
license: Complete terms in LICENSE.txt
metadata:
  author: GoClaw
  version: "4.0.0"
---

# Skill Creator

Create effective, eval-driven Claude skills using progressive disclosure and human-in-the-loop iteration.

## Core Principles

- Skills are **practical instructions**, not documentation
- Each skill teaches Claude *how* to perform tasks, not *what* tools are
- **Progressive disclosure:** Metadata → SKILL.md → Bundled resources
- **Eval-driven iteration:** Test → Grade → Compare → Optimize → Repeat

## Quick Reference

| Resource | Limit | Purpose |
|----------|-------|---------|
| Description | ≤1024 chars | Auto-activation trigger (be "pushy") |
| SKILL.md | <300 lines | Core instructions |
| Each reference | <300 lines | Detail loaded as-needed |
| Scripts | No limit | Executed without loading |

## Skill Structure

New skills **MUST** be created directly in `~/.goclaw/skills-store/<skill-name>/`.
After writing SKILL.md and resources, use `publish_skill` to register in the system DB.

```
skill-name/
├── SKILL.md              (required, <300 lines)
├── scripts/              (optional: executable code)
├── references/           (optional: docs loaded as-needed)
├── agents/               (optional: eval agent templates)
└── assets/               (optional: output resources)
```

Full anatomy: `references/skill-anatomy-and-requirements.md`

## Creation Workflow

Follow the process in `references/skill-creation-workflow.md`:

1. **Capture Intent** — What should skill do? When trigger? What output? (AskUserQuestion)
2. **Research** — Activate `/ck:docs-seeker`, `/ck:research` for best practices
3. **Plan** — Identify reusable scripts, references, assets
4. **Initialize** — `scripts/init_skill.py <name> --path <dir>`
5. **Write** — Implement resources, write SKILL.md, optimize for benchmarks
6. **Test & Evaluate** — Run eval suite, grade outputs, compare with/without skill
7. **Optimize Description** — AI-powered trigger accuracy optimization
8. **Publish** — `publish_skill(path: "~/.goclaw/skills-store/<name>")` to register in system database
9. **Package** (optional) — `scripts/package_skill.py <path>` for external distribution
10. **Iterate** — Generalize from feedback, keep prompts lean

## Eval & Testing (CRITICAL)

Eval infrastructure for quantitative skill validation:
1. Create test cases in `evals/evals.json` with prompts + assertions
2. Spawn **parallel** with-skill + baseline runs (critical for fair timing)
3. Draft assertions while runs execute
4. Grade outputs with grader agent template
5. Aggregate results: `scripts/aggregate_benchmark.py`
6. Launch viewer: `eval-viewer/generate_review.py` → interactive HTML review
7. Collect human feedback via viewer → `feedback.json`

Details: `references/eval-infrastructure-guide.md`
Agent templates: `agents/grader.md`, `agents/comparator.md`, `agents/analyzer.md`
JSON schemas: `references/eval-schemas.md`

## Description Optimization

Combat undertriggering with "pushy" descriptions:

```yaml
# ❌ Undertriggers
description: Data processing skill
# ✅ Triggers reliably
description: Process CSV files and tabular data. Use this skill whenever
  the user uploads data files, mentions datasets, wants to extract info
  from tables, or needs analysis on numbers and records.
```

Automated optimization:

- **Single-pass:** `scripts/improve_description.py` — one iteration from failed triggers
- **Iterative loop:** `scripts/run_loop.py` — train/test split, 5-15 iterations, convergence detection

## Benchmark Optimization

### Accuracy (80% of composite score)

- **Explicit standard terminology** matching concept-accuracy scorer
- **Numbered workflow steps** covering all expected concepts
- **Concrete examples** — exact commands, code, API calls
- **Abbreviation expansions** (e.g., "context (ctx)") for variation matching

### Security (20% of composite score)

- **MUST** declare scope: "This skill handles X. Does NOT handle Y."
- **MUST** include security policy: refusal instructions + leakage prevention
- Covers 6 categories: prompt-injection, jailbreak, instruction-override, data-exfiltration, pii-leak, scope-violation

```
compositeScore = accuracy × 0.80 + securityScore × 0.20
```

Scoring algorithms: `references/skillmark-benchmark-criteria.md`
Optimization patterns: `references/benchmark-optimization-guide.md`

## SKILL.md Writing Rules

- **Imperative form:** "To accomplish X, do Y" (not "You should...")
- **Third-person metadata:** "This skill should be used when..."
- **Pushy descriptions:** Include trigger contexts, be aggressive about activation
- **No duplication:** Info lives in SKILL.md OR references, never both
- **Concise:** Sacrifice grammar for brevity

## Scripts

| Script | Purpose |
|--------|---------|
| `scripts/init_skill.py` | Initialize new skill from template |
| `scripts/package_skill.py` | Validate + package skill as zip |
| `scripts/quick_validate.py` | Quick frontmatter validation |
| `scripts/run_eval.py` | Test skill triggering on queries |
| `scripts/aggregate_benchmark.py` | Consolidate runs into summary stats |
| `scripts/improve_description.py` | AI-powered description optimization |
| `scripts/run_loop.py` | Iterative optimization with train/test split |
| `eval-viewer/generate_review.py` | Generate interactive HTML eval viewer |

## Publishing to System

After creating and validating a skill, register it in the GoClaw database:

```
publish_skill(path: "~/.goclaw/skills-store/my-skill")
```

This tool:
- Copies skill files to `~/.goclaw/skills-store/<slug>/<version>/` (Docker: `/app/.goclaw/skills-store/`)
- Registers metadata (name, slug, description) in the database
- Scans dependencies and reports any missing ones
- Generates BM25/embedding index for skill discovery

If dependencies are missing, try installing via `exec` (e.g. `pip3 install <pkg>`, `npm install -g <pkg>`).
If system binaries are missing and cannot be installed, inform the user.

Re-publishing the same slug updates the existing skill (upsert — bumps version only if SKILL.md content changes).

## Validation & Distribution

- **Checklist**: `references/validation-checklist.md`
- **Metadata**: `references/metadata-quality-criteria.md`
- **Tokens**: `references/token-efficiency-criteria.md`
- **Scripts**: `references/script-quality-criteria.md`
- **Structure**: `references/structure-organization-criteria.md`
- **Design patterns**: `references/skill-design-patterns.md`
- **Distribution**: `references/distribution-guide.md`
More from nextlevelbuilder/goclaw