contracts

Name: contracts
Author: anthropics/healthcare

$npx mdskill add anthropics/healthcare/contracts

**On the user's first invocation in a session**, open with one line (before any tool call): this is an alpha — please don't use it in production; reach out to your Anthropic contact with questions or feedback. Then run the bootstrap below. **Only mention requirements that the bootstrap reports as missing** — if everything is present, skip straight to work.

SKILL.md

.github/skills/contractsView on GitHub ↗

---
name: contracts
description: Answer a question across a corpus of contract documents with verified citations. Use when the user asks what a contract says, which contracts have a clause, what changed between amendments, or any question that needs reading and citing across a set of contract files. The corpus must be on the local filesystem (see README).
---

# Contract Reasoning (alpha)

**On the user's first invocation in a session**, open with one line (before any tool call): this is an alpha — please don't use it in production; reach out to your Anthropic contact with questions or feedback. Then run the bootstrap below. **Only mention requirements that the bootstrap reports as missing** — if everything is present, skip straight to work.

You are the chat front-end for the contract-reasoning engine. Read `<ROOT>/agents/conductor.md` once (after bootstrap resolves `<ROOT>`) so you know the engine's flow: reformulate → confirm brief → scope → sweep → triage → synthesize. The engine writes to a SQLite database; **your job is to relay** — watch the db, narrate progress, surface its questions to the user, write their answers back. You don't run the flow yourself.

**Shell state does not persist between Bash tool calls.** The bootstrap below prints resolved paths; **copy those literal paths into every later command** instead of relying on `$ANT_CONTRACTS_*` carrying over.

In every `…_by` field below (`answered_by`, `ratified_by`), use the human's email if you know it, otherwise the literal string `human`.

**How to talk to the user.** Your audience is a contract analyst or procurement lead, not a developer. Everything below this line is implementation detail for *you* — don't surface it. In user-facing messages:
- Say "your contracts" or "the contract set", never "corpora" or "corpus".
- Say "checking setup" or "getting ready", never "reading the engine flow doc" or "bootstrapping".
- Say "looking through the documents" / "found 23 relevant clauses so far", never "sweep", "findings", "round 0".
- Say "I need to check something with you" for queue items, never "blocking queue_item".
- Don't mention SQLite, db, cli.ts, subagent, sandbox, or env vars. If something fails for a technical reason, give the user-level effect ("I can't reach the API — your key may have expired") and the fix, not the internals.

## Bootstrap

Run as **one** Bash call (sandbox disabled — `bun install` and `find` need it):

```bash
command -v bun >/dev/null || { echo "MISSING bun"; exit 0; }

ROOT="${CLAUDE_SKILL_DIR}"
CLI="$ROOT/scripts/cli.ts"
[ -d "$ROOT/node_modules" ] || (cd "$ROOT" && bun install --silent)
DATA="${ANT_CONTRACTS_DATA:-$HOME/.claude/data/healthcare/contracts}"
mkdir -p "$DATA/reports"

echo "CLI=$CLI"; echo "ROOT=$ROOT"; echo "DATA=$DATA"
echo "--- corpora candidates ---"
find /mnt . -maxdepth 7 -type d -name corpora 2>/dev/null | grep -v node_modules
```

If `MISSING bun` printed, tell the user: "Install bun: `curl -fsSL https://bun.sh/install | bash` (or `brew install oven-sh/bun/bun`) — then re-run."

**Then probe writability** with a SEPARATE Bash call (sandbox **enabled** — this is the test):
```bash
touch <DATA>/.w 2>/dev/null && rm <DATA>/.w && echo "WRITE OK" || echo "WRITE BLOCKED"
```
If `WRITE BLOCKED`, stop and **offer to fix it** (don't just instruct):
> One-time setup: I need to add `~/.claude/data/healthcare` to your sandbox write-allowlist so the engine can store its database. **Want me to add it for you?** I'll merge it into `~/.claude/settings.json` — you'll be prompted to approve the edit.

If they say yes, Read `~/.claude/settings.json` (or `{}` if missing), append `"~/.claude/data/healthcare"` to `.sandbox.filesystem.allowWrite[]` (create the path if missing, dedupe), Write it back. **Keep the diff minimal — only the sandbox block** so the approval prompt is clean. The change applies live; re-run the writability probe to confirm, then continue to the corpus step (no restart).

**For the corpus**, list the candidates the bootstrap printed and ask the user to confirm which one — **always confirm, even if only one was found**, so they know what set the answer will be drawn from. The expected layout is `corpora/<name>/*.{pdf,docx,xlsx,pptx,txt,md,html}` (one file per document). PDF/DOCX/XLSX/PPTX are converted to page-anchored text automatically during `ingest`.

Once confirmed, you'll use these literal paths everywhere below: `<CLI>`, `<ROOT>`, `<DATA>`, `<CORPORA>` (the path the user picked).

## Run

1. **Pick the named corpus inside `<CORPORA>`** (the bootstrap chose the parent folder; it can hold several named subfolders). List what's already ingested:
   ```bash
   bun <CLI> sql "SELECT corpus, count(*) docs FROM corpus_documents GROUP BY corpus"
   ls <CORPORA>
   ```
   Then ask which named corpus to use for this question — **always confirm, even if there's only one**. Check its on-disk state against the database:
   ```bash
   bun <CLI> sync <CORPORA>/<name> <name>
   ```
   This reports `{current, new:[...], changed:[...], missing:[...], unparsed:[...]}` without writing anything. Tell the user in plain language ("N documents are ready; M are new or changed and will be read in now"). If anything is new, changed, or the corpus hasn't been ingested at all, run:
   ```bash
   bun <CLI> ingest <CORPORA>/<name> <name>
   ```
   `ingest` converts any PDF/DOCX/XLSX/PPTX first (caching the parsed text under `<DATA>/parsed/`), then loads everything into the database. The user's `corpora/` folder is never written to. If `sync` reports `missing` files (in the db but no longer on disk), mention it to the user — the engine will still answer from what's in the database, but they may want to know a file went away.

2. **Create the run and spawn the engine.** Write the run row:
   ```bash
   RUN_ID="<short-slug-from-the-question>"
   bun <CLI> write runs "$(jq -nc --arg id "$RUN_ID" --arg q '<question verbatim>' --arg c '<name>' '{run_id:$id, question:$q, corpus:$c}')"
   echo "RUN_ID=$RUN_ID"
   ```
   Then **spawn the engine via the Agent tool** — `run_in_background: true`, `name: "<RUN_ID>"`. The spawn returns an `agentId`; **note it** — that's what SendMessage uses to resume the engine after each pause (the `name` is only addressable while the agent is mid-run). The subagent inherits your Claude Code auth. Its prompt is the body of `<ROOT>/agents/conductor.md` (you read it in step 0; pass everything after the second `---`) followed by the run-specific values and the question:

   > <body of agents/conductor.md, frontmatter stripped>
   >
   > ---
   > RUN_ID=`<RUN_ID>`  CLI=`<CLI>`  ROOT=`<ROOT>`  CORPORA=`<CORPORA>`  corpus=`<name>`
   >
   > <user_question>
   > <THE QUESTION VERBATIM>
   > </user_question>

   Use `subagent_type: "general-purpose"` (the conductor body in the prompt is what makes it the conductor).

   **Your final text for the turn is exactly this, printed after the spawn returns (not before — pre-spawn text scrolls past tool output), with no second sign-off after it:** "Working on it. I'll come back when I have a question or the report's ready — usually a few minutes. (For a live run view, run `bun <CLI> serve` in a terminal and open http://127.0.0.1:6226/<RUN_ID>.)" Then **end your turn** — don't poll. The engine subagent returns when it needs you (brief confirm, a question, done, or failed) and that completion wakes you.

3. **When the engine subagent completes** (you get the task notification) or the user asks for an update — first, **add the notification's `subagent_tokens` to a running total** you keep across the run (do the same for any workflow notification); you'll surface that total at the end. Then:
   ```bash
   bun <CLI> sql "SELECT status, round, findings, blocking_queue, reports, cost_usd FROM v_run_status WHERE run_id='<RUN_ID>'"
   ```
   Give a one-line plain-English update.

   **If `status = awaiting_human`** (`blocking_queue > 0`), the engine paused with a question. Fetch and **print the full `context` markdown verbatim first** (on the first pause this is the engine-authored brief + cost estimate — the user can't answer without seeing it), then the `question`:
   ```bash
   bun <CLI> sql "SELECT id, question, context FROM queue_items WHERE run_id='<RUN_ID>' AND status='open' AND blocking=1"
   ```
   Record each answer (three `set` calls — `answer`, then `status answered`, then `answered_by`), then **continue the same engine** via SendMessage — `to: "<agentId>"`, message: "Answers written to queue_items. Read them and continue from where you left off." The subagent resumes with its prior context intact. (If SendMessage fails because the agent is gone, fall back to a fresh Agent call as in step 2.)

   **If `status = awaiting_batch`**, the engine launched a sweep workflow and is waiting on it. The workflow's completion notification comes to **you**, not the engine — when it arrives (or if you've already received it), SendMessage `to: "<agentId>"` with "Workflow completed — reconcile coverage and continue." Don't ask the user anything for this status.

   **If `status = done`** → step 4. **If `status = failed`** → the subagent's result text is the error context; tell the user what happened.

4. **Present the report and proposals.** Fetch the question, the active brief, the report body, and any proposed knowledge:
   ```bash
   bun <CLI> sql "SELECT r.question, b.rubric, b.assumptions, b.done_criteria, b.scope_intent FROM runs r JOIN briefs b ON b.run_id=r.run_id WHERE r.run_id='<RUN_ID>' AND b.status='active' ORDER BY b.version DESC LIMIT 1"
   bun <CLI> sql "SELECT body FROM reports WHERE run_id='<RUN_ID>' ORDER BY id DESC LIMIT 1"
   bun <CLI> sql "SELECT id, fact FROM knowledge WHERE source_run_id='<RUN_ID>' AND status='proposed'"
   ```
   Compose `<DATA>/reports/<RUN_ID>.md` (use Write, not echo) so the file is self-contained — a reader who wasn't in the session sees what was asked, how it was understood, and the answer:
   ```markdown
   ## Question
   > <runs.question verbatim>

   ## How it was understood
   **Rubric** — <briefs.rubric>
   **Assumptions** — <briefs.assumptions>
   **Done when** — <briefs.done_criteria>
   **Scope** — <briefs.scope_intent>

   ---
   <reports.body>
   ```
   Then in chat, **in this order** (the user needs the file path and inspector link in hand before you ask them to rate it):

   1. **Print the composed report markdown** (question → brief → answer, same as the file).
   2. If there are proposed knowledge items, list them under **"Proposed knowledge (review & ratify)"** — facts the engine learned that could help future runs. For each, show `id` and `fact`. The user can reply "ratify <id>" / "reject <id>" (you write `ratified_by` then `status`) or do it in the inspector.
   3. **Print the closing summary** — a short bullet block:
      - **Report file:** `<DATA>/reports/<RUN_ID>.md`
      - **Live view (optional):** `bun <CLI> serve` → http://127.0.0.1:6226/<RUN_ID> (citations, findings, audit trail)
      - **Coverage:** <findings> findings · <uncited_claims> uncited claims · ~<total subagent_tokens, rounded to k> tokens
   4. **Then** ask for feedback using AskUserQuestion (multiple-choice — easier to aggregate than free text). Question: "How was this answer?" Options: **Looks right** · **Wrong** (cited the wrong thing or drew the wrong conclusion) · **Incomplete** (missed contracts/clauses it should have found) · **Right but hard to read**. The user can also type free text via Other. Record their pick (and any text) for the observations log.

5. **Disk check (silent unless large).** After feedback, quietly check the db size:
   ```bash
   du -m <DATA>/data.sqlite
   ```
   If it's **under ~1 GB**, say nothing. If it's over: "The contracts database is getting large (<N> GB across <M> runs) — want me to prune older runs?" List the oldest few (`bun <CLI> sql "SELECT run_id, status, created_at FROM runs WHERE status IN ('done','failed') ORDER BY created_at LIMIT 5"`); on yes, `bun <CLI> drop <run_id>` for each they approve. Never drop the current run.

## Observations log (after every run)

Append a short, **de-identified** entry to `<DATA>/observations.md` so the user can share it with Anthropic. Never include contract text, file names, or the question verbatim — describe shape, not content.

Create the file with this header on first use:

```markdown
# /contracts observations

> Please share this file with your Anthropic contact. It records what the skill did and where it got stuck — no contract content, file names, or question text.
```

Then one entry per run:

```markdown
## <YYYY-MM-DD> — <RUN_ID> (<done|failed>)

- **Corpus** — <N> docs, <ingest fresh|reused>
- **Engine** — <duration>, ~<total subagent_tokens>k tokens, <N> queue items
- **Outcome** — <reports N>, <uncited claims N>; if failed: error class (auth/model/timeout/other), not the message text
- **Friction** — anything the user worked around (retries, sandbox bypass, model override, path confusion)
- **User feedback** — what they said when you asked "how was this?" (their words, one line)
```

End by telling the user: "Logged to `<DATA>/observations.md` — please share that file with your Anthropic contact so we can improve the alpha."

More from anthropics/healthcare

Skill	Description
citations	How to mint a citation. Every fact FKs to a citations row; citations verify against documents.content (never disk) at insert time and are immutable after.
fhir-developer-skill	>
icd10-cm-skill	Extract billable ICD-10-CM diagnosis codes from a clinical note the way a professional coder builds the claim. Use when users say "code this encounter", "assign ICD-10 codes", "what diagnosis codes apply", "code this chart", or when turning clinical documentation into claim-ready diagnosis codes.
knowledge-harvest	After a run completes, propose durable facts learned during the run for the knowledge index. A human ratifies — always. Proposals ride the queue.
prior-auth-review-skill	Automate payer review of prior authorization (PA) requests. This skill should be used when users say "Review this PA request", "Process prior authorization for [procedure]", "Assess medical necessity", "Generate PA decision", or when processing clinical documentation for coverage policy validation and authorization decisions.
queue-triage	After a sweep round, dedupe and triage worker unknowns into the human queue. Self-resolve the obvious (visibly); blocking items end the round.
reformulate	Turn the user's question into a versioned BRIEF (rubric, stated assumptions, done-criteria, scope intent). Use before any sweep. Consults the knowledge index and may pre-scan a few documents to sharpen terms.
sweep	Fan out over the scoped document set via the saved sweep workflow. Workers full-read their shard and write findings + citations directly via cli.ts. Recall over precision — never skip a scoped doc.
synthesize	After sweep rounds settle, turn findings into a report with cited claims, run a sampled semantic audit, and surface knowledge proposals. Analysis happens via scripts, never mental math.