build-repo-context

$npx mdskill add UKGovernmentBEIS/inspect_evals/build-repo-context

Extracts institutional knowledge from GitHub history into a shared context document.

  • Helps agents understand repo conventions and avoid common mistakes.
  • Integrates with GitHub CLI to fetch PRs, issues, and review comments.
  • Decides scope by checking existing document headers for date and PR ranges.
  • Outputs an updated markdown file containing distilled knowledge and patterns.
SKILL.md
.github/skills/build-repo-contextView on GitHub ↗
---
name: build-repo-context
description: Crawl repository PRs, issues, and review comments to distill institutional knowledge into a shared knowledge base. Run periodically by "context agents" to maintain agent_artefacts/repo_context/REPO_CONTEXT.md. Trigger only on specific request.
---

# Build Repo Context

Crawl GitHub history (PRs, issues, review comments) and distill institutional knowledge into `agent_artefacts/repo_context/REPO_CONTEXT.md`. This document helps worker agents understand repo conventions, common mistakes, and known tech debt before making changes.

## Workflow

### 1. Setup

1. Create `agent_artefacts/repo_context/` if it doesn't exist
2. Read existing `agent_artefacts/repo_context/REPO_CONTEXT.md` if present (will be updated, not replaced)

### 2. Identify What's New

Use the header of `REPO_CONTEXT.md` to determine what to process. The header contains the last-updated date and PR range (e.g., `PRs processed: #965-#1050`).

- **First run** (no `REPO_CONTEXT.md`): Fetch the most recent 50 merged PRs + all open issues
- **Incremental runs**: Fetch PRs merged after the highest PR number in the header, and issues updated since the last-updated date

Use the `gh` CLI to list candidates:

```bash
# First run: recent merged PRs
gh pr list --state merged --limit 50 --json number,title,labels,additions,deletions,reviewDecision,mergedAt

# Incremental: PRs merged since last crawl
gh pr list --state merged --search "merged:>YYYY-MM-DD" --limit 50 --json number,title,labels,additions,deletions,reviewDecision,mergedAt

# Open issues
gh issue list --state open --limit 100 --json number,title,labels,createdAt,updatedAt
```

### 3. Triage

Fast pass over PR titles and metadata. **Skip** these categories (they rarely contain design insights):

- Dependency bumps (titles matching `bump`, `update dependencies`, `renovate`, `dependabot`)
- Changelog-only updates (titles matching `changelog`, `scriv`)
- Bot-generated PRs with no review comments
- PRs with fewer than 5 lines changed and no review comments

**Prioritize** PRs that have:

- Review comments (especially multiple rounds — that's where design discussion lives)
- Changes touching shared utilities (`src/inspect_evals/utils/`, `CONTRIBUTING.md`, `BEST_PRACTICES.md`, `AGENTS.md`)

**Cap at 50 PRs per run** to keep execution time reasonable.

### 4. Extract

For each selected PR, fetch:

```bash
# PR body and metadata
gh pr view <N> --json body,title,labels,files,reviewDecision,comments,reviews

# Review comments (inline code review feedback)
gh api repos/{owner}/{repo}/pulls/<N>/comments --paginate

# Issue comments (general discussion)
gh api repos/{owner}/{repo}/issues/<N>/comments --paginate
```

For open issues, fetch body and comments similarly.

**Link traversal**: If a comment references another PR/issue (e.g., "see #123" or "fixed in #456"), continue to crawl recursively up to 3 hops in total. Do not recurse to an existing PR/issue in the chain to prevent loops.

### 5. Distill

This is the core intellectual work. For each PR/issue, extract **actionable insights** in these categories:

- **Design decisions**: What architectural choice was made and why? What alternatives were rejected?
- **Reviewer corrections**: What mistakes did reviewers catch? These reveal common pitfalls.
- **Established conventions**: What patterns were deliberately chosen that future contributors should follow?
- **Tech debt acknowledged**: What shortcuts were taken intentionally? What should NOT be "fixed" without discussion?
- **Common agent mistakes**: If review comments mention agent-generated code issues, capture the pattern.

**Quality requirements for each insight**:

- Must cite source PR/issue number (e.g., "Per PR #973...")
- Must be actionable ("Do X" / "Don't do Y"), not descriptive ("PR #123 added X")
- Must add nuance beyond what CONTRIBUTING.md and BEST_PRACTICES.md already state
- Must be relevant to future contributors, not just historically interesting
- Must be broadly applicable beyond a single issue or evaluation. If the context is excessively narrow, leave it out.
- Must reflect team convention, not a single maintainer's code style or proposal. If in doubt, leave it out.

**Skip**:

- Bot comments (dependabot, renovate, CI status checks)
- Feature announcements without design implications
- Trivial PRs (typo fixes, version bumps) unless they reveal a convention
- Duplicate insights already captured in REPO_CONTEXT.md

### 6. Merge Into REPO_CONTEXT.md

Integrate new insights into the existing document structure. **Do not just append** — place each insight in the appropriate section and deduplicate:

- If a new insight updates or supersedes an existing one, replace it
- If a section is getting too long, distill further (combine related insights)
- Update the header metadata (last updated date, PR watermark)
- Keep total document size between 500-1000 lines (aggressive distillation if over)

**Each insight appears in exactly one section** — do not repeat the same rule across multiple sections with different framing (see step 7).

### 7. Deduplicate & Consolidate

After merging, review the full document for **cross-section duplication**. This is critical — incremental runs naturally introduce duplication because the same convention surfaces in multiple PR reviews (e.g., "use `@pytest.mark.docker`" might appear as a reviewer correction, an established convention, AND a testing recipe).

**Process**:

1. For each insight, search the entire document for overlapping content. Look for insights that cover the same topic even if phrased differently.
2. Keep each insight in **exactly one location** — the most specific section that fits. Prefer this priority:
   - "Rules & Conventions" for mandatory practices ("always do X", "never do Y")
   - "Testing Recipes" for detailed how-to patterns (mock setup, test structure)
   - "Known Tech Debt" for acknowledged issues that should not be fixed without discussion
   - "CI/Tooling" for build/CI/tooling specifics
   - "Open Issues" for bugs and design direction
3. Remove the duplicate occurrences, keeping the most complete/specific version.
4. Combine related insights that are split across bullets into a single, richer bullet.

**Common duplication patterns to watch for**:

- The same pytest marker rule appearing in both "Rules" and "Testing Recipes"
- Reviewer corrections that duplicate established conventions (merge into the convention)
- Agent mistakes that are just the inverse of an established convention (keep only the convention)
- API usage patterns appearing in both rules and recipes (keep the rule brief, detail in recipes)

## Bounding Rules

| Rule                         | Limit                                       |
| ---------------------------- | ------------------------------------------- |
| First run scope              | Most recent 50 merged PRs + all open issues |
| Incremental run scope        | New items since last crawl                  |
| Max PRs per run              | 50                                          |
| Link traversal depth         | 3 hops                                      |
| Target REPO_CONTEXT.md size  | 500-1000 lines                              |
| Max issues per run           | 100                                         |

## Insight Quality Guidelines

These are critical — the value of REPO_CONTEXT.md depends on insight quality:

1. **Every insight must cite its source** PR or issue number. It is acceptable to cite multiple sources for the same insight.
2. **Insights must be actionable**: "Do X" / "Don't do Y", not "PR #123 added X"
3. **Don't duplicate existing docs**: Only add nuance that CONTRIBUTING.md and BEST_PRACTICES.md miss
4. **Skip noise**: Bot comments, feature announcements without design implications, trivial PRs
5. **Focus on**: Reviewer corrections, design trade-offs, rejected alternatives, acknowledged tech debt, common agent mistakes
6. **Be specific**: "Use `hf_dataset()` wrapper instead of raw `load_dataset()` for HuggingFace datasets (PR #842)" is better than "Use the right dataset loading function"
7. **Date-stamp volatile insights**: If an insight might become stale (e.g., "Currently X is broken"), include the date so agents can verify

## Expected Output

After running this workflow:

```text
agent_artefacts/repo_context/
└── REPO_CONTEXT.md     # Distilled institutional knowledge (committed)
```

## Verification Checklist

After each run, verify:

1. `REPO_CONTEXT.md` exists and has well-structured content
2. Insights cite source PR/issue numbers
3. Insights are actionable, not merely descriptive
4. **No duplicate insights across sections** — search for key terms (e.g., `sample ID`, `get_model`, `@pytest.mark`) and confirm each appears in exactly one place
5. Document stays under ~1000 lines
6. Header metadata (date, PR range) is updated
7. Incremental runs don't reprocess already-crawled PRs
More from UKGovernmentBEIS/inspect_evals