generate-asset-actions

$npx mdskill add UKGovernmentBEIS/inspect_evals/generate-asset-actions

Regenerate asset action tiers from asset inventory.

  • Updates asset policies based on host reliability and maintenance status.
  • Depends on ASSETS.yaml and internal audit manifest tools.
  • Classifies targets by comparing upstream history and domain trust.
  • Outputs updated YAML and markdown summaries for audit tracking.
SKILL.md
.github/skills/generate-asset-actionsView on GitHub ↗
---
name: generate-asset-actions
description: Generate asset-actions.yaml from ASSETS.yaml by classifying assets into priority tiers. Use when the user asks to regenerate, update, or refresh the asset actions.
---

# Generate Asset Policy

Regenerate `internal/audits/asset-actions.yaml` and `internal/audits/audit-summary.md` from `ASSETS.yaml`.

If ASSETS.yaml may be stale, run `uv run python tools/generate_asset_manifest.py` first.

Run `uv run python tools/summarise_asset_manifest.py` to get aggregate counts (by type, by state, totals). Use these numbers when populating `audit-summary.md`.

## Classification

Read `ASSETS.yaml`. For each asset, determine target stage first, then priority. Process both `state: floating` assets AND `state: pinned` assets that match known-unstable sources (since their target is `controlled`, they are not yet at their target stage).

### Target stages (per ADR-0007)

The target stage depends on **host reliability**, not asset type:

- **`controlled`** (Stage 2) — any asset where upstream has broken before, maintainer is unresponsive/deprecated, OR host is unreliable (personal repos, Google Drive, `.edu` domains, university servers, any host without version control). This applies to `git_clone`, `direct_url`, and `huggingface` alike.
- **`pinned`** (Stage 1) — assets on reliable, version-controlled hosts (GitHub, HuggingFace, well-known CDNs) with no history of breakage.

Per ADR-0007: "Anything hosted on a less reliable domain (personal websites, Google Drive, university servers, or any host without version control) should skip straight to Stage 2."

### Priority tiers

1. **Urgent** — all other floating refs on reliable hosts. Target is `pinned`.
2. **High** — matches a known-unstable source (see registry below). Target is `controlled`.
3. **Medium** — unreliable host (`drive.google.com`, `.edu` domains, personal repos/websites) not already in the known-unstable registry. Target is `controlled`.

For assets with `state: pinned` and a `{SHA}` placeholder but no checksum, classify as **Low** (target: `pinned` with checksum).

Omit assets already at their target stage.

Every entry needs: `eval`, `source`, `type`, `state`, `target`, `action`, `reason`.

## Known-Unstable Sources

Update this list when new instability is discovered.

| Source                           | Eval       | Incident                              |
| -------------------------------- | ---------- | ------------------------------------- |
| `xlang-ai/OSWorld`               | osworld    | Files removed (PR #958)               |
| `openai/evals`                   | makemesay  | Deprecated upstream                   |
| `corebench.cs.princeton.edu`     | core_bench | University server, no versioning      |
| `epatey/fonts`                   | osworld    | Personal repo                         |
| `ShishirPatil/gorilla`           | bfcl       | Data format issues (PR #954)          |
| `yunx-z/MLRC-Bench`              | mlrc_bench | Broken task                           |
| `LRudL/sad`                      | sad        | Upstream bugs (issues #7, #8)         |
| `meg-tong/sycophancy-eval`       | sycophancy | Invalid JSON/NaN, workaround in code  |
| `josancamon/paperbench`          | paperbench | Paper ID mismatch (HF discussion #2)  |
| `sentientfutures/moru-benchmark` | moru       | Exact duplicate rows                  |

## Verification

1. `asset-actions.yaml` parses as valid YAML
2. Every floating asset in ASSETS.yaml appears in urgent, high, or medium
3. `floating_assets + needing_checksums + no_action_needed == total_external_assets`
4. Numbers in `audit-summary.md` match output of `summarise_asset_manifest.py`
More from UKGovernmentBEIS/inspect_evals