ablation-planner
$
npx mdskill add wanshuiyin/Auto-claude-code-research-in-sleep/ablation-plannerSystematically design ablation studies that answer the questions reviewers will ask. Codex leads the design (reviewer perspective), CC reviews feasibility and implements.
SKILL.md
.github/skills/ablation-plannerView on GitHub ↗
---
name: ablation-planner
description: Use when main results pass result-to-claim (claim_supported=yes or partial) and ablation studies are needed for paper submission. Codex designs ablations from a reviewer's perspective, CC reviews feasibility and implements.
argument-hint: [method-description-or-claim]
allowed-tools: Bash(*), Read, Grep, Glob, Write, Edit, mcp__codex__codex, mcp__codex__codex-reply
---
# Ablation Planner
Systematically design ablation studies that answer the questions reviewers will ask. Codex leads the design (reviewer perspective), CC reviews feasibility and implements.
## Context: $ARGUMENTS
## When to Use
- Main results pass `/result-to-claim` with claim_supported = yes or partial
- User explicitly requests ablation planning
- `/auto-review-loop` reviewer identifies missing ablations
## Workflow
### Step 1: Prepare Context
CC reads available project files to build the full picture:
- Method description and components (from docs/research_contract.md or project CLAUDE.md)
- Current experiment results (from EXPERIMENT_LOG.md, EXPERIMENT_TRACKER.md, or W&B)
- Confirmed and intended claims (from result-to-claim output or project notes)
- Available compute resources (from CLAUDE.md server config, if present)
### Step 2: Codex Designs Ablations
```
mcp__codex__codex:
config: {"model_reasoning_effort": "xhigh"}
prompt: |
You are a rigorous ML reviewer planning ablation studies.
Given this method and results, design ablations that:
1. Isolate the contribution of each novel component
2. Answer questions reviewers will definitely ask
3. Test sensitivity to key hyperparameters
4. Compare against natural alternative design choices
Method: [description from project files]
Components: [list of removable/replaceable components]
Current results: [key metrics from experiments]
Claims: [what we claim and current evidence]
For each ablation, specify:
- name: what to change (e.g., "remove module X", "replace Y with Z")
- what_it_tests: the specific question this answers
- expected_if_component_matters: what we predict if the component is important
- priority: 1 (must-run) to 5 (nice-to-have)
Also provide:
- coverage_assessment: what reviewer questions these ablations answer
- unnecessary_ablations: experiments that seem useful but won't add insight
- suggested_order: run order optimized for maximum early information
- estimated_compute: total GPU-hours estimate
```
### Step 3: Parse Ablation Plan
Normalize Codex response into structured format:
```markdown
## Ablation Plan
### Component Ablations (highest priority)
| # | Name | What It Tests | Expected If Matters | Priority |
|---|------|---------------|---------------------|----------|
| 1 | remove module X | contribution of X | performance drops on metric Y | 1 |
| 2 | replace X with simpler Z | value of learned vs fixed | drops, especially on dataset A | 2 |
### Hyperparameter Sensitivity
| # | Parameter | Values to Test | What It Tests | Priority |
|---|-----------|---------------|---------------|----------|
| 3 | lambda | [0.01, 0.1, 1.0] | sensitivity to regularization | 3 |
### Design Choice Comparisons
| # | Name | What It Tests | Priority |
|---|------|---------------|----------|
| 4 | joint vs separate matching | whether joint adds value | 4 |
### Coverage Assessment
[What reviewer questions these ablations answer]
### Unnecessary Ablations
[Experiments that seem useful but won't add insight — skip these]
### Run Order
[Optimized for maximum early information]
### Estimated Compute
[Total GPU-hours]
```
### Step 4: CC Reviews Feasibility
Before running anything, CC checks:
- Compute budget: can we afford all ablations with available GPUs?
- Code changes: which ablations need code modifications vs config-only changes?
- Dependencies: which ablations can run in parallel?
- Cuts: if budget is tight, propose removing lower-priority ablations and ask Codex to confirm
### Step 5: Implement and Run
1. Create configs/scripts for each ablation (config-only changes first)
2. Smoke test each ablation before full run
3. Run in suggested order, using descriptive names (e.g., `ablation-no-module-X`)
4. Track results in EXPERIMENT_LOG.md
5. After all ablations complete → update findings.md with insights
## Rules
- **Codex leads the design. CC does not pre-filter or bias the ablation list** before Codex sees it. Codex thinks like a reviewer; CC thinks like an engineer.
- Every ablation must have a clear `what_it_tests` and `expected_if_component_matters`. No "just try it" experiments.
- Config-only ablations take priority over those needing code changes (faster, less error-prone).
- If total compute exceeds budget, CC proposes cuts and asks Codex to re-prioritize — don't silently drop ablations.
- Component ablations (remove/replace) take priority over hyperparameter sweeps.
- Do not generate ablations for components identical to the baseline (no-op ablations).
- Record all ablation results in EXPERIMENT_LOG.md, including negative results (component removal had no effect = important finding).
More from wanshuiyin/Auto-claude-code-research-in-sleep
- alphaxivQuick single-paper lookup via AlphaXiv LLM-optimized summaries with tiered source fallback. Use when user says "explain this paper", "summarize paper", pastes an arXiv/AlphaXiv URL, or provides a bare arXiv ID for quick understanding - not for broad literature search.
- analyze-resultsAnalyze ML experiment results, compute statistics, generate comparison tables and insights. Use when user says "analyze results", "compare", or needs to interpret experimental data.
- auto-paper-improvement-loopAutonomously improve a generated paper via GPT-5.4 xhigh review → implement fixes → recompile, for 2 rounds. Use when user says \"改论文\", \"improve paper\", \"论文润色循环\", \"auto improve\", or wants to iteratively polish a generated paper.
- auto-review-loopAutonomous multi-round research review loop. Repeatedly reviews via external reviewer backend (Codex or manual), implements fixes, and re-reviews until positive assessment or max rounds reached. Use when user says "auto review loop", "review until it passes", or wants autonomous iterative improvement.
- auto-review-loop-llmAutonomous research review loop using any OpenAI-compatible LLM API. Configure via llm-chat MCP server or environment variables. Trigger with "auto review loop llm" or "llm review".
- auto-review-loop-minimaxAutonomous multi-round research review loop using MiniMax API. Use when you want to use MiniMax instead of Codex MCP for external review. Trigger with "auto review loop minimax" or "minimax review".
- citation-auditZero-context verification that every bibliographic entry in the paper is real, correctly attributed, and used in a context the cited paper actually supports. Uses a fresh cross-model reviewer with web/DBLP/arXiv lookup to catch hallucinated authors, wrong years, fabricated venues, version mismatches, and wrong-context citations (cite present but the cited paper does not establish the claim). Use when user says \"审查引用\", \"check citations\", \"citation audit\", \"verify references\", \"引用核对\", or before submission to ensure bibliography integrity.
- claims-draftingDraft patent claims for an invention. Use when user says \"撰写权利要求\", \"draft claims\", \"写权利要求书\", \"claim drafting\", or wants to create patent claims. The core skill of the patent pipeline.
- comm-lit-review-claude-singleCommunications-domain literature review with Claude-style knowledge-base-first retrieval. Use when the task is about communications, wireless, networking, satellite/NTN, Wi-Fi, cellular, transport protocols, congestion control, routing, scheduling, MAC/PHY, rate adaptation, channel estimation, beamforming, or communication-system research and the user wants papers, related work, a survey, or a landscape summary. Search Zotero, Obsidian, and local paper folders first when available, then search IEEE Xplore, ScienceDirect, ACM Digital Library, and broader web in that order.
- deepxivSearch and progressively read open-access academic papers through DeepXiv. Use when the user wants layered paper access, section-level reading, trending papers, or DeepXiv-backed literature retrieval.