curate

Name: curate
Author: boshu2/agentops
$npx mdskill add boshu2/agentops/curate
Mine agent data for insights without modifying code
Extracts skill diffs, bd updates, and rare wiki entries from agent data
Analyzes transcripts, .agents, bd, and git repositories
Applies lean startup principles and wiki knowledge surfacing
Produces research markdown, bd notes, and synthesis outputs
SKILL.md
.github/skills/curateView on GitHub ↗
---
name: curate
description: Mine transcripts, .agents, bd, and git for skill diffs, bd updates, or
  rare wiki entries.
practices:
- wiki-knowledge-surface
- lean-startup
hexagonal_role: supporting
consumes: []
produces:
- .agents/research/*.md
context_rel: []
skill_api_version: 1
user-invocable: true
context:
  window: fork
  intent:
    mode: task
  sections:
    exclude:
    - TASK
  intel_scope: full
metadata:
  tier: knowledge
  dependencies: []
output_contract: .agents/research/*.md (synthesis), bd notes, skill diffs (rare);
  never code mutations
---

# /curate — Canonical Miner Skill

> **Role:** miner. Input = trinity slice (transcripts, `.agents/`, `bd`, `git`). Output = skill diffs (proposed), bd updates, rare wiki entries. **Never mutates code.**

> **Status (2026-05-08):** introduced ADDITIVE in Phase 1 (m6v5.D.1 / soc-78s2v). Existing miners (dream, harvest, forge, compile, retro, post-mortem-mining, flywheel, trace, provenance, defrag) stay until Phase 3 shim conversion (m6v5.D.3). Fix-C smoke (`soc-wb2aa`) gates Phase 3.

## Modes (≤8 per Fix-F mode-flag budget)

| Mode | Purpose | Replaces (post-Phase 3) |
|---|---|---|
| `--mode=dream` | Overnight bounded INGEST→REDUCE→MEASURE on `.agents/` | `/dream` |
| `--mode=harvest` | Cross-rig promotion + post-mortem mining + flywheel rollup | `/harvest`, `/post-mortem` (mining half), `/flywheel` |
| `--mode=forge` | Per-session transcript mining (SessionEnd cadence) | `/forge` |
| `--mode=compile` | Mine→Grow→Defrag→Lint corpus pipeline | `/compile` |
| `--mode=retro` | Single-session learning capture | `/retro` |
| `--mode=defrag` | Knowledge defragmentation (overnight) | `compile-session-defrag.sh` hook |
| `--mode=watch` | In-session drift / loop detection (15-min cadence) | `research-loop-detector.sh` hook |
| `--mode=provenance` | Decision-trace + artifact-provenance walk | `/provenance`, `/trace` |

**Mode-budget assertion:** 8 modes. Adding a 9th requires demoting an existing one OR refusing the addition (per Fix-F § continuous CI gate).

**Anti-goals (hard constraints from architecture):**

- NEVER mutates source code.
- NEVER invokes `/rpi` or any code-mutating flow.
- NEVER performs git operations (no commits, branches, push, rebase, checkout).
- NEVER creates symlinks anywhere.

## Quick Start

```bash
/curate --mode=harvest                  # cross-rig promotion sweep
/curate --mode=forge                    # mine the most recent session's transcript
/curate --mode=dream --duration=8h      # overnight bounded run
/curate --mode=compile                  # rebuild .agents/ corpus
/curate --mode=retro                    # capture this session's learning
/curate --mode=defrag                   # knowledge defragmentation
/curate --mode=watch                    # in-session drift check
/curate --mode=provenance --bead=soc-X  # walk decision trace for a bead
```

## Execution

### Step 1: Resolve mode + scope

Parse `--mode`. Each mode has its own scope semantics:

| Mode | Reads | Writes | Cadence (typical loop binding) |
|---|---|---|---|
| dream | `.agents/` corpus | `.agents/overnight/<run-id>/` summary + per-iteration JSON | overnight (1×/24h) |
| harvest | `.agents/` across rigs (`~/.agents/learnings/`) | `~/.agents/learnings/` (promotion), `.agents/harvest/latest.json` | daily (1×/24h) |
| forge | session transcripts (`~/.claude/projects/<session>/*.jsonl`) | `.agents/learnings/`, `.agents/patterns/` | per-session (SessionEnd) or 30m loop |
| compile | `.agents/` corpus | `wiki/INDEX.generated.md`, `.agents/compile/<date>.md` | weekly |
| retro | this session's transcript + recent diffs | `.agents/retro/index.jsonl` (append) | per-session (manual) |
| defrag | `.agents/` corpus | `.agents/defrag/<date>.md` (cleanup report) | overnight (1×/24h) |
| watch | last 100 lines of current session transcript | `.agents/watch/<date>.md` (advisory) | in-session 15m loop |
| provenance | `bd` graph + `git log` + `.agents/` for given anchor | `.agents/provenance/<anchor>.md` | on-demand |

### Step 2: Acquire lock (when applicable)

For `--mode=dream`: acquire `.agents/overnight/run.lock` exclusively. If held, exit with "another curator holds the lock" message; do NOT block.

For `--mode=harvest`: check the dream lock. If held, defer harvest to next loop fire.

For `--mode=forge`: per-session lock at `.agents/forge/session-<id>.lock`. Auto-released at end of run.

Other modes: no lock; concurrent invocation is safe.

### Step 3: Run the mode-specific body

Each mode delegates to a body section in this skill (see § per-mode bodies below for outline; full content moves to `references/<mode>.md` at Phase 2 startup).

### Step 4: Produce artifacts

Output is one of (priority order, per architecture knowledge-flywheel rule):

1. **Skill diffs** — proposed changes to existing skill bodies, written to `.agents/skill-diffs/<date>-<skill>.diff`. Operator approves before applying. NEVER writes to `skills/` directly.
2. **bd updates** — `bd note`, `bd remember`, or new `bd create` for surfaced issues. Direct, no approval queue.
3. **Wiki entries** — `.agents/research/`, `.agents/learnings/`, `~/.agents/learnings/` (rare; only when knowledge is generally reusable).

### Step 5: Append to LOG.md

Every mode appends one line to `.agents/LOG.md`:

```
2026-MM-DDTHH:MM:SSZ [curate --mode=<mode>] <run-id> — <short-summary> — outputs: <artifact-paths>
```

### Step 6: Report

1. Mode + scope
2. Output path(s)
3. Surfaced bd issues (if any)
4. Loop continuation hint (next-fire cadence per architecture catalog)

## Per-mode bodies (outline)

### --mode=dream

Overnight INGEST → REDUCE → MEASURE iterations until halt:
- INGEST: scan `.agents/` for new artifacts since last run
- REDUCE: dedupe + score + cluster
- MEASURE: compute knowledge-corpus health metrics
- Halt when: wall-clock budget exhausted, plateau (K sub-epsilon deltas), regression beyond per-metric floor, metadata integrity failure
- Knowledge-only; never code mutation
- Output: `.agents/overnight/<run-id>/summary.{json,md}` + per-iteration archive

Detailed body remains inline until Phase 2 extraction.

### --mode=harvest

Cross-rig promotion sweep:
- Walk `.agents/` across all rigs (paths from `~/.agents/rigs.yaml` or fleet config)
- Extract learnings/patterns/research artifacts
- Dedupe by content hash
- Promote high-confidence items to `~/.agents/learnings/` global hub
- Roll up flywheel-health metrics as byproduct
- Output: `.agents/harvest/latest.json` + promoted files in global hub

Detailed body remains inline until Phase 2 extraction.

### --mode=forge

Per-session transcript mining:
- Locate latest transcript (`~/.claude/projects/<project>/<session>/*.jsonl`)
- Extract knowledge candidates (decisions, patterns, anti-patterns, bug fixes)
- Validate candidates against finding-registry contract
- Queue to `.agents/knowledge/pending/` for curator review
- Output: pending markdown files; bd notes for high-confidence findings

Detailed body remains inline until Phase 2 extraction.

### --mode=compile

Corpus pipeline:
- Mine: extract candidate knowledge from `.agents/`
- Grow: merge into existing taxonomy
- Defrag: collapse redundant entries
- Lint: check for orphans, contradictions, staleness
- Output: `wiki/INDEX.generated.md` (rebuilt), `.agents/compile/<date>.md` (lint report)

Detailed body remains inline until Phase 2 extraction.

### --mode=retro

Single-session learning capture:
- Read last N turns + diff summary
- Identify one durable insight (or none — exit clean)
- Append to `.agents/retro/index.jsonl`
- Optional: surface to bd as note

Detailed body remains inline until Phase 2 extraction.

### --mode=defrag

Knowledge defragmentation:
- Find duplicate entries across `.agents/` (content-hash + semantic-similarity)
- Find broken backlinks
- Find stale entries (last_cited > TTL, low hit_count)
- Output: `.agents/defrag/<date>.md` with proposed retirements (NEVER auto-deletes; operator approves)

Detailed body remains inline until Phase 2 extraction.

### --mode=watch

In-session drift detection:
- Read last 100 transcript turns
- Detect: research loops without code change, repeated grep-without-read, oscillating decisions
- Write advisory to `.agents/watch/<date>.md`; surface high-severity to bd note
- Cheap; designed for 15-min cadence

Detailed body remains inline until Phase 2 extraction.

### --mode=provenance

Decision-trace walk:
- Given anchor (bead ID, file path, or decision marker)
- Walk: bd graph (parents, blockers, refs) + git log (commits touching anchor) + `.agents/` mentions
- Output: `.agents/provenance/<anchor>.md` with chronological trace

Detailed body remains inline until Phase 2 extraction.

## Constraints (one-role-per-skill)

- **One role: miner.** Output never mutates code; always lands as proposed diffs, bd updates, or wiki entries.
- **No new modes** without dropping/merging an existing one (Fix-F mode-budget cap = 8).
- **Lock contract** — dream mode is exclusive; harvest defers when dream is
  running; other modes are safe-concurrent.

## See Also

- `skills/rpi/SKILL.md` — orchestrator
- `skills/validate/SKILL.md` — validator role (paired canonical skill)
- `.agents/research/2026-05-08-fix-F-mode-flag-budget.md` — mode-cull rationale

## Reference Documents

- [references/curate.feature](references/curate.feature) — Executable spec: resolve mode + scope, acquire lock when writing shared state, mine into synthesis + bd notes (soc-qk4b)