code-quality-review-all

Name: code-quality-review-all
Author: UKGovernmentBEIS/inspect_evals

$npx mdskill add UKGovernmentBEIS/inspect_evals/code-quality-review-all

Audit all evaluations against one code quality standard.

Enforces consistent quality across every evaluation in the repository.
Depends on CONTRIBUTING.md and BEST_PRACTICES.md for standard definitions.
Generates topic-specific documentation and structured review results.
Outputs detailed summaries and markdown reports for quality assurance.

SKILL.md

.github/skills/code-quality-review-allView on GitHub ↗

---
name: code-quality-review-all
description: Review all evaluations in the repository against a single code quality standard. Checks ALL evals against ONE standard for periodic quality reviews. Use when user asks to review/audit/check all evaluations for a specific topic or standard. Do NOT use for reviewing a single eval (use eval-quality-workflow instead) or for test coverage (use ensure-test-coverage instead).
---

# Review All Evaluations

Review all evaluations in the repository against a single code quality standard or topic. This workflow is useful for systematic code quality improvements and ensuring consistency across all evaluations.

## Workflow

### Setup

If not already provided, ask user for the topic from the CONTRIBUTING.md or BEST_PRACTICES.md that this review should be focused on. If not provided, come up with a short topic identifier in a file-safe format (e.g., `pytest_marks`, `import_patterns`, `test_coverage`).

Create or read existing directory structure:

- `<repo root>/agent_artefacts/code_quality/<topic_id>/` - Directory for this review topic
- `<repo root>/agent_artefacts/code_quality/<topic_id>/README.md` - Documentation for this specific topic
- `<repo root>/agent_artefacts/code_quality/<topic_id>/results.json` - Results of the review
- `<repo root>/agent_artefacts/code_quality/<topic_id>/SUMMARY.md` - Summary of the review

### README.md Structure

The `README.md` file should contain topic-specific information:

- **Topic Description**: What this review checks for and why it matters
- **Requirements**: Specific requirements from CONTRIBUTING.md or BEST_PRACTICES.md
- **Detection Strategy**: How to identify issues (patterns to look for, tools to use)
- **Commands**: Specific commands useful for this topic (grep patterns, pytest commands, etc.)
- **Good Examples**: Code snippets showing correct implementation
- **Bad Examples**: Code snippets showing common mistakes and how to fix them
- **Review Date**: When this documentation was created/updated

### results.json Structure

The `results.json` file should follow the template in `assets/results-template.json`. It contains one entry per evaluation in `<repo root>/src/inspect_evals/`, with status and issue details.

**Important**: The `issue_location` field should use paths relative to the repository root with forward slashes (e.g., `tests/foo/test_foo.py:42` or `src/inspect_evals/foo/bar.py:15`, not `C:\Users\...\test_foo.py:42` or `tests\foo\test_foo.py:42`).

### General Guidelines for All Topics

1. **Systematic Approach**: Review all evaluations in a consistent manner. Use scripts or automated tools where possible to ensure completeness.

2. **Clear Issue Reporting**: Each issue should include:
- The specific file and line number where the issue occurs
- A clear description of what's wrong
- A concrete suggestion for how to fix it
- The issue type for categorization

3. **Verification**: After identifying potential issues, verify a sample of them by reading the actual files to ensure accuracy.

4. **Statistics**: Provide summary statistics including:
- Total evaluations reviewed
- Number passing/failing
- Breakdown of issue types
- Most affected evaluations

5. **Prioritization**: Identify which issues are most critical or affect the most evaluations to help guide remediation efforts.

6. **Reusability**: Write scripts and documentation that can be rerun easily as the codebase evolves. Include any helper scripts in the topic directory.

7. **False Positives**: Be aware that automated detection may produce false positives. When possible, include logic to reduce these or document known limitations.

### Workflow Steps

1. Create the directory structure: `agent_artefacts/code_quality/<topic_id>/`
2. **Handle existing results.json** (if re-running review):
- Read existing results.json
- Note issues that have "fix_status" field (were attempted to be fixed)
- After scanning current code state, compare with existing results
- **Remove entries for issues that no longer exist** (they were successfully fixed)
- Keep entries for issues that still exist, preserving "fix_status" if present
- Add new entries for newly discovered issues
3. Use autolint to scan all evaluations: `uv run python tools/run_autolint.py --all-evals --check`. Parse its output to identify structural issues across evals. For topic-specific checks beyond autolint's scope, write targeted grep/AST scripts in the topic directory.
4. Organize findings by evaluation name
5. Write topic-specific documentation to `README.md`
6. Write results to `results.json` with relative paths:
- Include all currently detected issues
- **IMPORTANT**: Preserve "fix_status" field for issues that still exist
- Remove issues that are no longer detected in the code
7. Create a `SUMMARY.md` file in the topic directory with:
- Overview of findings
- Key statistics
- Most affected evaluations
- Recommendations for remediation
- Impact analysis
8. If you created helper scripts, save them in the topic directory for future use
9. Inform the user that the review is complete and where to find the results

## Skill Responsibilities

**This skill (code-quality-review-all) owns results.json** and has full control:

- Add new issues when detected
- Update existing issues if location/description changes
- **Remove issues that no longer exist** in the codebase
- Preserve "fix_status" field when updating issues
- Update evaluation status (pass/fail) based on current findings

**The code-quality-fix-all skill** has limited control:

- Can ONLY add/update the "fix_status" field on existing issues
- Cannot remove entries from results.json
- Relies on this skill to verify fixes and remove resolved issues

This separation ensures:

- Clear ownership of results.json
- Fix skill focuses on fixing, not determining what's fixed
- Review skill has authoritative view of current code state

## Expected Output

After running this workflow, you should have:

```text
agent_artefacts/code_quality/<topic_id>/
├── README.md # Topic-specific documentation
├── results.json # Detailed results for all evaluations
├── SUMMARY.md # Executive summary
└── <helper_scripts> # Optional: automated checker scripts
```

More from UKGovernmentBEIS/inspect_evals