prepare-submission-workflow

$npx mdskill add UKGovernmentBEIS/inspect_evals/prepare-submission-workflow

Verify repo requirements and generate evaluation registration YAML.

  • Validates upstream repositories for pyproject.toml and task decorators.
  • Fetches repository files via WebFetch to confirm dependency declarations.
  • Determines readiness by checking for missing project tables or decorators.
  • Outputs a YAML file at register/<eval_name>/eval.yaml for submission.
SKILL.md
.github/skills/prepare-submission-workflowView on GitHub ↗
---
name: prepare-submission-workflow
description: Prepare an evaluation for PR submission as an entry to the register. Use when user asks to prepare an eval for submission or finalize a PR. Trigger when the user asks you to run the "Prepare Evaluation For Submission" workflow.
---

# Prepare Eval For Submission

Since May 2026, new evaluations are submitted as **entries to the register** — the evaluation code lives in your own upstream repository, and you add a pointer to it here. Code is no longer added directly to `src/inspect_evals/`. If the user appears to be submitting evaluation code into the repo, direct them to [`register/README.md`](../../../register/README.md) for the full process.

## Workflow Steps

To prepare an evaluation for submission as a pull request:

### 1. Verify upstream repo requirements

The upstream repo must:

- Have a `pyproject.toml` with a `[project]` table so it can be installed via `uv sync`
- Declare `inspect_ai` as a dependency
- Define each task with the `@task` decorator from `inspect_ai`

Ask the user whether their upstream repo meets these requirements. Offer to check for them — if they provide the GitHub repository URL, fetch the repo's `pyproject.toml` and task files (e.g. via `WebFetch` on the raw GitHub URLs) to verify the requirements are met. If any requirement is not met, tell the user what needs to be fixed upstream before they can register.

### 2. Gather information and create `register/<eval_name>/eval.yaml`

Skip this step if `register/<eval_name>/eval.yaml` already exists.

Use [`register/example_eval.yaml`](../../../register/example_eval.yaml) as the template — it documents every field. Don't ask the user field-by-field; instead, derive what you can from the upstream repo first, then ask one batched question for what's missing.

**Hints on what to derive from the upstream repo (don't ask):**

- `source.repository_url` — from step 1.
- `source.repository_commit` — fetch the latest commit SHA on the default branch (must be a 40-char SHA, not a tag or branch).
- `tasks[].name` and `tasks[].task_path` — locate every `@task`-decorated function in the repo and record the function name and file path.
- `title` — from the upstream README heading or `pyproject.toml` `[project].name`.
- `description` — draft from the upstream README; keep to one short paragraph since the generated README links back upstream.
- `source.maintainers` — defaults to the repo owner; only override if the repo is org-owned and the real maintainers are individuals.
- `tags` — propose based on the eval's domain (e.g. `Coding`, `games`, `tools`). The upstream repo name is added automatically, so don't include it.

Use [`register/example_eval.yaml`](../../../register/example_eval.yaml) to determine what additional questions are needed.

Show the user the drafted YAML for confirmation before writing the file. Do **not** set `id` — it is auto-injected from the directory name.

### 3. Run validation

```bash
make check
```

This validates the `eval.yaml` and auto-generates a `README.md` next to it. The README is fully generated from `eval.yaml` — do not edit it by hand.

Because the generated page defers to upstream for details, make sure the upstream repo's README covers the dataset, scorer, task parameters, and how the eval was validated.

### 4. Create a changelog fragment

```bash
uv run scriv create
```

### 5. Open a PR

Use the [PR template](../../.github/PULL_REQUEST_TEMPLATE.md). The reviewer will ping anyone listed under `source.maintainers` for acknowledgement before merging.
More from UKGovernmentBEIS/inspect_evals