generate-asset-actions

Name: generate-asset-actions
Author: UKGovernmentBEIS/inspect_evals

$npx mdskill add UKGovernmentBEIS/inspect_evals/generate-asset-actions

Regenerate asset action tiers from asset inventory.

Updates asset policies based on host reliability and maintenance status.
Depends on ASSETS.yaml and internal audit manifest tools.
Classifies targets by comparing upstream history and domain trust.
Outputs updated YAML and markdown summaries for audit tracking.

SKILL.md

.github/skills/generate-asset-actionsView on GitHub ↗

---
name: generate-asset-actions
description: Generate asset-actions.yaml from ASSETS.yaml by classifying assets into priority tiers. Use when the user asks to regenerate, update, or refresh the asset actions.
---

# Generate Asset Policy

Regenerate `internal/audits/asset-actions.yaml` and `internal/audits/audit-summary.md` from `ASSETS.yaml`.

If ASSETS.yaml may be stale, run `uv run python tools/generate_asset_manifest.py` first.

Run `uv run python tools/summarise_asset_manifest.py` to get aggregate counts (by type, by state, totals). Use these numbers when populating `audit-summary.md`.

## Classification

Read `ASSETS.yaml`. For each asset, determine target stage first, then priority. Process both `state: floating` assets AND `state: pinned` assets that match known-unstable sources (since their target is `controlled`, they are not yet at their target stage).

### Target stages (per ADR-0007)

The target stage depends on **host reliability**, not asset type:

- **`controlled`** (Stage 2) — any asset where upstream has broken before, maintainer is unresponsive/deprecated, OR host is unreliable (personal repos, Google Drive, `.edu` domains, university servers, any host without version control). This applies to `git_clone`, `direct_url`, and `huggingface` alike.
- **`pinned`** (Stage 1) — assets on reliable, version-controlled hosts (GitHub, HuggingFace, well-known CDNs) with no history of breakage.

Per ADR-0007: "Anything hosted on a less reliable domain (personal websites, Google Drive, university servers, or any host without version control) should skip straight to Stage 2."

### Priority tiers

1. **Urgent** — all other floating refs on reliable hosts. Target is `pinned`.
2. **High** — matches a known-unstable source (see registry below). Target is `controlled`.
3. **Medium** — unreliable host (`drive.google.com`, `.edu` domains, personal repos/websites) not already in the known-unstable registry. Target is `controlled`.

For assets with `state: pinned` and a `{SHA}` placeholder but no checksum, classify as **Low** (target: `pinned` with checksum).

Omit assets already at their target stage.

Every entry needs: `eval`, `source`, `type`, `state`, `target`, `action`, `reason`.

## Known-Unstable Sources

Update this list when new instability is discovered.

| Source                           | Eval       | Incident                              |
| -------------------------------- | ---------- | ------------------------------------- |
| `xlang-ai/OSWorld`               | osworld    | Files removed (PR #958)               |
| `openai/evals`                   | makemesay  | Deprecated upstream                   |
| `corebench.cs.princeton.edu`     | core_bench | University server, no versioning      |
| `epatey/fonts`                   | osworld    | Personal repo                         |
| `ShishirPatil/gorilla`           | bfcl       | Data format issues (PR #954)          |
| `yunx-z/MLRC-Bench`              | mlrc_bench | Broken task                           |
| `LRudL/sad`                      | sad        | Upstream bugs (issues #7, #8)         |
| `meg-tong/sycophancy-eval`       | sycophancy | Invalid JSON/NaN, workaround in code  |
| `josancamon/paperbench`          | paperbench | Paper ID mismatch (HF discussion #2)  |
| `sentientfutures/moru-benchmark` | moru       | Exact duplicate rows                  |

## Verification

1. `asset-actions.yaml` parses as valid YAML
2. Every floating asset in ASSETS.yaml appears in urgent, high, or medium
3. `floating_assets + needing_checksums + no_action_needed == total_external_assets`
4. Numbers in `audit-summary.md` match output of `summarise_asset_manifest.py`

More from UKGovernmentBEIS/inspect_evals