prepare-release
$
npx mdskill add UKGovernmentBEIS/inspect_evals/prepare-releaseCut new inspect_evals releases with automated branch creation.
- Handles version bumps and changelog collection for releases.
- Requires authenticated gh CLI and push repository access.
- Validates changelog fragments before proceeding with release.
- Executes git commands to create branches and open PRs.
SKILL.md
.github/skills/prepare-releaseView on GitHub ↗
---
name: prepare-release
description: Prepare a new release of inspect_evals by creating a release branch, collecting changelog fragments, and opening a PR. Use when user asks to cut/prepare/create a new release or version bump.
---
# Prepare a Release
This workflow prepares a new release of `inspect_evals`. It creates a release branch, collects changelog fragments, bumps the version tag, and opens a PR. After merge it tags the merge commit and creates a GitHub release.
Reference: `PACKAGE_VERSIONING.md`
## Prerequisites
- The `gh` CLI must be authenticated
- You must have push access to the repository
- There must be changelog fragments in `changelog.d/` (beyond `.gitkeep` and `TEMPLATE.md`)
## Phase 1 — Prepare the release branch
1. **Ensure `main` is up to date**:
```bash
git fetch origin
git checkout main
git merge --ff-only origin/main
```
If the merge fails, stop and inform the user that their local `main` has diverged from `origin/main`.
2. **Check for changelog fragments**:
List files in `changelog.d/` excluding `.gitkeep`, `TEMPLATE.md`, and `README.*`. If there are no fragments, stop and tell the user there is nothing to release.
3. **Determine the new version**:
a. Find the current version from the latest `v*` git tag:
```bash
git tag --sort=-v:refname --list 'v*' | head -1
```
b. Ask the user what kind of bump this is, presenting the semver table from `PACKAGE_VERSIONING.md`:
| Component | When to bump | Examples |
| --------- | ------------ | -------- |
| **Major** | Breaking changes | Removing an eval, API changes, scorer output format changes |
| **Minor** | New features | Adding new evals, new task parameters, new utilities |
| **Patch** | Bug fixes | Eval fixes, scorer fixes, dataset loading fixes |
c. Compute the new version string (e.g. `0.4.0`, `0.3.107`). Confirm with the user.
4. **Create the release branch**:
The branch name is `release-YYYY-MM-DD` using today's date.
```bash
git checkout -b release-YYYY-MM-DD
```
## Phase 2 — Collect the changelog
1. **Run scriv collect**:
```bash
uv run scriv collect --version <NEW_VERSION>
```
This removes the individual fragment files from `changelog.d/` and prepends a new section to `CHANGELOG.md`.
2. **Present the changelog diff to the user for review**:
```bash
git diff CHANGELOG.md
```
Tell the user: *"Please review the collected changelog above. You can edit `CHANGELOG.md` directly — let me know when you are satisfied, or tell me what changes to make."*
**Do not proceed until the user confirms the changelog is ready.**
3. **Stage and commit**:
```bash
git add CHANGELOG.md changelog.d/
git commit -m "Prepare release v<NEW_VERSION>"
```
## Phase 3 — Push and open a PR
1. **Push the branch**:
```bash
git push -u origin release-YYYY-MM-DD
```
2. **Open a draft PR**:
<!-- markdownlint-disable-next-line no-space-in-code -->
Extract the new version's section from `CHANGELOG.md` (everything from the version heading up to but not including the next `## ` heading). Use this as the PR body:
```bash
gh pr create --draft \
--title "Release v<NEW_VERSION>" \
--body "<EXTRACTED_CHANGELOG_SECTION>"
```
**Important**: The PR title **must** start with `Release v` (e.g. `Release v0.4.0`) — this is how the `release-on-merge.yml` workflow identifies release PRs.
Tell the user the PR URL and that it is in draft. They should mark it ready for review when appropriate.
## Phase 4 — After the PR is merged (automated)
Once the release PR is merged into `main`, the `.github/workflows/release-on-merge.yml` GitHub Actions workflow automatically:
1. Tags the merge commit with `v<NEW_VERSION>`
2. Creates a GitHub release with the changelog section as the body
3. Builds and publishes the package to PyPI
4. Notifies Slack
No manual action is required. The package version is derived from the new tag by `setuptools_scm`.
## Notes
- The package version is derived entirely from git tags via `setuptools_scm` — there is no version file to edit.
- If something goes wrong mid-workflow, the release branch can be deleted and the process restarted.
- The `v` prefix on tags is required (e.g. `v0.3.107`, not `0.3.107`).
- The `weekly-release.yml` workflow can also automate Phases 1–3 on a schedule or via manual dispatch, creating the release branch and draft PR for you.
More from UKGovernmentBEIS/inspect_evals
- build-repo-contextCrawl repository PRs, issues, and review comments to distill institutional knowledge into a shared knowledge base. Run periodically by "context agents" to maintain agent_artefacts/repo_context/REPO_CONTEXT.md. Trigger only on specific request.
- check-trajectories-workflowUse Inspect Scout to analyze agent trajectories from evaluation log files. Runs default and custom scanners to detect external failures, formatting issues, reward hacking, and ethical refusals. Use when user asks to check/analyze agent trajectories. Trigger when the user asks you to run the "Check Agent Trajectories" workflow.
- ci-maintenance-workflowCI and GitHub Actions maintenance workflows — fix a failing test from a CI URL, fix a failing smoke test, add @pytest.mark.slow markers to slow tests, or review a PR against agent-checkable standards. Use when user asks to fix a failing test, fix a smoke test, mark slow tests, or review a PR. Trigger when the user asks you to run the "Write a PR For A Failing Test", "Fix A Failing Smoke Test", "Mark Slow Tests", or "Review PR According to Agent-Checkable Standards" workflow.
- code-quality-fix-allFix code quality issues identified in a code quality review stored in agent_artefacts/code_quality/<topic>/. Systematically addresses issues found by the code-quality-review-all skill for ANY code quality topic, with validation and testing at each step. Use when user asks to fix issues from a code quality review, or asks to fix issues from agent_artefacts/code_quality/<topic>.
- code-quality-review-allReview all evaluations in the repository against a single code quality standard. Checks ALL evals against ONE standard for periodic quality reviews. Use when user asks to review/audit/check all evaluations for a specific topic or standard. Do NOT use for reviewing a single eval (use eval-quality-workflow instead) or for test coverage (use ensure-test-coverage instead).
- create-evalRedirect to the inspect-evals-template for creating new evaluations. New evals are no longer created in this repository — they live in standalone repos. Use when user asks to create/implement/build a new evaluation.
- ensure-test-coverageEnsure test coverage for a single evaluation - both reviewing existing tests and creating missing ones. Analyzes testable components, checks tests against repository conventions, reports coverage gaps, and creates or improves tests. Use when user asks to check/review/create/add/ensure tests for an eval. Use whenever you are asked to review an evaluation that contains tests, or whenever you need to write a suite of tests. Do NOT use for fixing a specific failing CI test (use ci-maintenance-workflow instead).
- eval-quality-workflowFix or review a single evaluation against all EVALUATION_CHECKLIST.md standards. Use "fix" mode to refactor an eval into compliance, or "review" mode to assess compliance without making changes. Use when user asks to fix, review, or check an evaluation's quality. Trigger when the user asks you to run the "Fix An Evaluation" or "Review An Evaluation" workflow. Do NOT use for reviewing ALL evals against a single code quality standard (use code-quality-review-all instead).
- eval-report-workflowCreate an evaluation report for a README by selecting models, estimating costs, running evaluations, and formatting results tables. Use when user asks to make/create/generate an evaluation report. Trigger when the user asks you to run the "Make An Evaluation Report" workflow.
- eval-validity-reviewReview a single evaluation's validity — whether its claims hold up, whether its name is accurate, whether samples can be both succeeded and failed at, and whether scoring measures ground truth. Use when user asks to check validity of an eval, or as part of the Master Checklist workflow. Do NOT use for code quality or test coverage (use eval-quality-workflow or ensure-test-coverage instead).