code-quality-fix-all

Name: code-quality-fix-all
Author: UKGovernmentBEIS/inspect_evals
$npx mdskill add UKGovernmentBEIS/inspect_evals/code-quality-fix-all
Fix code quality issues from reviews with validation and testing.
Resolves problems found in code quality review artifacts.
Depends on code-quality-review-all skill and agent_artefacts.
Selects targets via user questions and complexity filters.
Delivers corrected code with step-by-step validation checks.
SKILL.md
.github/skills/code-quality-fix-allView on GitHub ↗
---
name: code-quality-fix-all
description: Fix code quality issues identified in a code quality review stored in agent_artefacts/code_quality/<topic>/. Systematically addresses issues found by the code-quality-review-all skill for ANY code quality topic, with validation and testing at each step. Use when user asks to fix issues from a code quality review, or asks to fix issues from agent_artefacts/code_quality/<topic>.
---

# Code Quality Fix All

Fix code quality issues identified in a code quality review. This skill systematically addresses issues found by the `code-quality-review-all` skill for ANY code quality topic, with validation and testing at each step.

## Expected Arguments

When invoked, this skill expects the path to a code quality topic as an argument (e.g., `agent_artefacts/code_quality/private_api_imports`).

If not provided, the skill will ask the user for the topic path. Within the topic path, there are several files:

- README.md - contains description of the issue and examples of how to fix it
- results.json - contains list of all identified issues
- SUMMARY.md - contains summary of the identified issues

**Filters and options** are specified interactively after the skill starts by using the `AskUserQuestion` tool to present options unless specified otherwise in arguments.

- Which issue types to target:
  - all
  - specific types
  - Fix complexity level (easy only, medium and below, or all)
- Which evaluations to fix (all, specific ones, evaluations with small number of issues)
- Maximum number of issues to fix in this run

## Workflow

### Phase 1: Understanding the Topic and Planning

1. **Read topic documentation**
   - Read the README.md to understand:
     - What code quality issue this topic addresses
     - Why it matters (stability, maintainability, etc.)
     - How to detect the issue
     - How to fix the issue (fix patterns, examples)
   - Read results.json to get all identified issues
   - Identify which issues are in scope based on arguments

2. **Analyze and categorize issues**
   - Analyze fix complexity based on:
     - issue_description
     - suggested_fix from results.json
     - Fix examples in README.md
   - Classify as:
     - **Easy**: Single-line changes, clear fix pattern in README
     - **Medium**: Multi-line changes, well-documented fix approach
     - **Hard**: No clear fix pattern, requires research or copying code
   - Group issues by evaluation and issue type
   - Generate statistics for presenting to user

3. **Ask user for filtering preferences**
   - Use `AskUserQuestion` tool to ask:
     - Which evaluations to fix? (all / specific ones / most affected)
     - Which issue types to target? (all / specific types)
     - Fix complexity level? (easy only / easy+medium / all)
     - Max issues per run? (all / limit to specific number)
   - Apply filters based on user responses
   - Present filtered plan with:
     - Number of issues to fix
     - Breakdown by evaluation and issue type
     - Complexity distribution
     - Ask for final confirmation to proceed

4. **Validate understanding of fixes**
   - For each unique issue type in scope:
     - Check if README.md documents how to fix it
     - Look for "Good Examples" and "Bad Examples" sections
     - Check "suggested_fix" field in results.json
   - If fix approach is unclear for any issue type:
     - Research the correct approach
     - Update `<topic>/README.md` with findings
     - Ask user for guidance if still uncertain

### Phase 2: Pre-Fix Validation

For each issue to be fixed:

1. **Read and understand context**
   - Read the entire file containing the issue (not just the line)
   - Understand how the problematic code is used
   - Look for related issues in the same file
   - Check for patterns that might affect the fix (e.g., multiple occurrences)
   - Identify any cascading changes needed (related imports, type hints, etc.)

2. **Validate the suggested fix**
   - Review the "suggested_fix" from results.json
   - Check against fix patterns in README.md
   - Verify the fix won't break functionality
   - For complex fixes:
     - Check if dependencies/alternatives actually exist
     - Validate that replacement code follows same patterns
     - Consider edge cases

3. **Estimate change scope**
   - Count how many lines will change for this fix
   - Identify if cascading changes are needed
   - Determine if multiple files need updating
   - **If changes exceed 100 lines for a single issue:**
     - Alert user with:
       - Issue details
       - Why the change is large
       - What will change
     - Get explicit approval before proceeding

### Phase 3: Applying Fixes

1. **Apply fixes systematically**

- Create a new branch to apply fixes to, with a name like agent/<short_description_of_issue>
- Process one evaluation at a time
- Within each evaluation, group by issue type
- For each fix:
  - Use Edit tool to apply the change
  - Follow the suggested_fix guidance
  - Apply fix patterns from README.md
  - Handle related issues in same file together
  - Add comments if the fix requires it (e.g., copied code attribution)
- Track what was fixed

1. **Verify changes compile/parse**
   - After fixing each file, validate:
     - File is syntactically valid (Python can parse it)
     - No obvious import errors introduced
     - Code follows repository patterns
   - If validation fails:
     - Investigate the issue
     - Attempt to fix validation error
     - Rollback change if cannot be resolved

2. **Track progress**
   - Maintain list of:
     - Issues successfully fixed (file, line, issue type)
     - Issues that couldn't be fixed (with reasons)
     - Evaluations that have been modified
     - Files that were changed

### Phase 4: Testing and Validation

1. **Run linting**
   - Run repository's linter on modified files (ruff, flake8, mypy, etc.)
   - Check for:
     - Import errors
     - Type checking errors
     - Style violations introduced
   - Fix any linting issues that result from changes
   - If linting issues can't be fixed, document them

2. **Run unit tests**
   - Identify test files for each modified evaluation
   - Run unit tests for affected evaluations using pytest:

     **Basic test commands:**

     ```bash
     # Install relevant packages in the event of import failure
     uv sync --extra test

     # Run tests for a specific evaluation
     uv run pytest tests/<evaluation_name>/

     # Run a specific test file
     uv run pytest tests/test_file.py

     # Run a specific test
     uv run pytest tests/test_file.py::TestClass::test_method

     # Run slow tests (excluded by default)
     uv run pytest --runslow tests/

     # Skip dataset download tests
     uv run pytest -m 'not dataset_download' tests/

     # Run only slow tests
     uv run pytest -m slow tests/
     ```

     **Test markers to be aware of:**
     - `@pytest.mark.slow` - Tests taking >10 seconds
     - `@pytest.mark.dataset_download` - Tests that download datasets
     - `@pytest.mark.docker` - Tests using Docker
     - `@pytest.mark.huggingface` - HuggingFace-related tests

     - Focus on tests for the specific evaluation
     - Look for test failures or errors
   - **IMPORTANT**: Do NOT run full evaluations (they take too long) unless user explicitly requests it

3. **Handle test failures**
   - For each test failure:
     - Read test output carefully
     - Determine if failure is caused by the fix
     - Check if it's a pre-existing failure
   - If caused by fix:
     - Try to adjust the fix to make tests pass
     - If cannot be resolved, rollback the change
     - Document the issue for user review
   - If pre-existing:
     - Note it but don't block on it
     - Inform user

### Phase 5: Re-Review and Handle Remaining Issues

1. **Update results.json with fix status**
   - For each issue that was fixed, add `"fix_status"` field after `"suggested_fix"`:

     ```json
     {
      ...
       "suggested_fix": "...",
       "fix_status": "fixed - please review"
     }
     ```

   - For issues that couldn't be fixed, add explanation:

     ```json
     "fix_status": "not fixed - reason: ..."
     ```

   - **IMPORTANT**: Do NOT remove any entries from results.json - only add/update "fix_status"
   - The code-quality-review-all skill owns results.json and is responsible for removing entries

2. **Re-run code quality review**
   - **IMPORTANT**: Use Task tool to spawn subagent running code-quality-review-all skill
   - Pass the same topic path
   - This will update results.json with current state
   - Compare results before and after to identify:
     - Issues that are now resolved (no longer appear)
     - New issues that may have been introduced
     - Issues that still remain despite fix attempts

3. **Fix remaining issues if in scope**
   - For each new or remaining in-scope issue:
     - Investigate why previous fix didn't work
     - Attempt alternative fix approach
     - Update "fix_status" with attempt results
   - Repeat this process until no more in-scope issues can be fixed

4. **Update topic's README.md**
   - Add any knowledge that you have discovered that will be useful in detecting or fixing topic-related issues in the future
   - Do not remove examples of bad code or patterns that were fixed - they will be useful in future reviews and fixes of future evaluations.

5. **Update SUMMARY.md**
   - Add a "Recent Fixes" section with:
     - Date of fix run
     - Number of issues fixed
     - Which evaluations were updated
   - Keep historical data (don't remove past information)
   - Update recommendations to reflect remaining work

6. **Run markdown linters**
   - Use `uv run pre-commit run markdownlint-fix` to fix markdown linting issues

### Phase 6: Create PR Description and Present Results

1. **Create/Update PR description (cumulative)**
   - Read existing `PR_DESCRIPTION.md` if it exists (from previous runs)
   - **Cumulative tracking**: PR description represents ALL changes from branch base, not just this run
   - If PR_DESCRIPTION.md exists:
     - Parse existing content to extract previous runs' data
     - Append information from this run
     - Update cumulative statistics
   - If PR_DESCRIPTION.md doesn't exist (first run):
     - Create new file
   - Format for GitHub/GitLab pull request with:
     - **Summary**: Brief overview of the code quality topic and total fixes (2-3 sentences)
     - **Overall Changes** (cumulative from all runs):
       - Total issues fixed across all runs by type
       - Total evaluations affected
       - Total files modified
     - **Fix Sessions**: List each run session with:
       - Date/time of run
       - Issues fixed in that session
       - Complexity level targeted (easy/medium/all)
     - **Fixed Issues** (cumulative): Table or list with all file paths and issue types from all runs
     - **Testing** (from latest run):
       - Which tests were run
       - Pass/fail status
       - Any test issues encountered
     - **Remaining Issues** (current state):
       - Count of issues still open
       - Brief note on what remains
     - **Review Notes** (cumulative):
       - Any complications or special considerations from any run
       - Areas that need extra attention during review
   - Use proper markdown formatting for PR readability
   - **IMPORTANT**: Do NOT commit PR_DESCRIPTION.md - it's only for creating the PR
   - Example structure:

     ```markdown
     ## Summary
     Fix private API imports code quality issues across evaluations.

     ## Overall Changes
     - Total issues fixed: 25
     - Evaluations affected: 8

     ## Fix Sessions

     ### Session 1: 2026-01-18 10:30 (Easy issues)
     - Fixed 10 easy issues
     - Targeted: Easy complexity, All evaluations

     ### Session 2: 2026-01-18 14:15 (Medium issues)
     - Fixed 15 medium issues
     - Targeted: Medium complexity, Specific evaluations

     ## Fixed Issues
     [Table of all fixed issues from all sessions]

     ## Testing
     [Latest test results]

     ## Remaining Issues
     6 issues remain (4 hard, 2 require investigation)

     ## Review Notes
     - Session 1: All tests passed
     - Session 2: One test required adjustment in fortress/scorer.py
     ```

2. **Present results to user**
   - Show high-level summary for **this run**:
     - X issues fixed in this session
     - Y issues remain
     - Z tests passed
   - Show **cumulative progress** from PR_DESCRIPTION.md:
     - Total issues fixed across all runs
     - Number of fix sessions completed
   - Show before/after statistics from SUMMARY.md
   - List modified files from this run
   - Display content of PR_DESCRIPTION.md for user review
   - **Do NOT automatically commit** - let user review changes

3. **Offer next steps**
   - **Create commit and PR**: Offer to:
     - Commit all changes (source files, results.json, SUMMARY.md)
     - Create pull request with description from PR_DESCRIPTION.md
     - Note: PR_DESCRIPTION.md itself is NOT committed (it's just for PR description)
     - The PR description includes **cumulative changes from all fix sessions** on this branch
   - **Run more fixes**: If issues remain, suggest running skill again with different filters
     - Running again will **append** to PR_DESCRIPTION.md, creating cumulative tracking
     - This allows iterative fixing: easy issues first, then medium, then hard
   - **Manual review needed**: List any issues that require manual attention

## Important Guidelines

### Safety First

- **Never batch all fixes blindly** - Validate each fix type before applying en masse
- **Always read before editing** - Understand context before changing code
- **Verify fixes don't break functionality** - Run tests incrementally
- **Be conservative** - Skip fixes you're uncertain about rather than risk breaking code
- **Get approval for large changes** - Alert user when fixes exceed 100 lines
- **Have rollback strategy** - Be able to revert if fixes cause problems

### Context is Critical

- **Understand the quality issue** - Read README.md thoroughly
- **Understand why code was written that way** - There might be good reasons
- **Look for patterns** - Similar issues often need similar fixes
- **Check related code** - Fixes might require updating nearby code
- **Read existing comments** - Developers might have documented why they used certain patterns

### Validation at Every Step

- **Verify fix patterns from README** - Don't guess how to fix
- **Check suggested_fix in results.json** - Use provided guidance
- **Validate changes compile** - Ensure code parses after changes
- **Run linters** - Catch style and import issues
- **Run tests** - Detect regressions immediately
- **Re-run review** - Verify fixes actually resolve issues

### Communication

- **Show plan before executing** - Let user see what will be fixed
- **Alert for large changes** - Get approval for fixes >10 lines
- **Report uncertainties** - Flag issues where fix approach is unclear
- **Show progress** - Keep user informed during fixes
- **Explain failures** - Document why certain issues couldn't be fixed
- **Provide detailed reports** - Create comprehensive fix reports

### What NOT to Do

- **Don't assume you know how to fix** - Always consult README.md and results.json
- **Don't remove entries from results.json** - Only add/update "fix_status" field
- **Don't replace PR_DESCRIPTION.md** - Append to it to maintain cumulative history across runs
- **Don't run full evaluations** - Only run unit tests (evaluations are slow)
- **Don't commit automatically** - Let user review changes first
- **Don't commit PR_DESCRIPTION.md** - It's only for creating the PR
- **Don't fix issues you can't validate** - Skip rather than risk breaking
- **Don't ignore test failures** - Investigate or rollback
- **Don't make unrelated changes** - Only fix the specific quality issues
- **Don't assume all issues of same type are identical** - Context matters
More from UKGovernmentBEIS/inspect_evals