experiment-design-pipeline
$
npx mdskill add GRIND-Lab-Core/night_owl_research_agent/experiment-design-pipelineRefine and concretize: **$ARGUMENTS**
SKILL.md
.github/skills/experiment-design-pipelineView on GitHub ↗
--- name: experiment-design-pipeline description: 'Run an end-to-end workflow that chains the skills `refine-research` and `experiment-design`. Use when the user wants a one-shot pipeline from vague research direction to focused final proposal plus detailed experiment roadmap, or asks to build a pipeline, do it end-to-end, or generate both the method and experiment plan together.' argument-hint: [topic-or-scope] allowed-tools: Bash(*), Read, Write, Edit, Grep, Glob, WebSearch, WebFetch, Agent --- # Experiment Design Pipeline: End-to-End Method and Experiment Planning Refine and concretize: **$ARGUMENTS** ## Overview Use this skill when the user does not want to stop at a refined method. The goal is to produce a coherent package that includes: - a problem-anchored, elegant final proposal - the review history explaining why the method is focused - a detailed experiment roadmap tied to the paper's claims - a compact pipeline summary that says what to run next This skill composes two existing workflows: 1. `refine-research` for method refinement 2. `experiment-design` for claim-driven validation planning For stage-specific detail, read these sibling skills only when needed: - `../refine-research/SKILL.md` - `../experiment-design/SKILL.md` ## Core Rule Do not plan a large experiment suite on top of an unstable method. First stabilize the thesis. Then turn the stable thesis into experiments. ## Default Outputs Refinement stage (written by `refine-research`, under `output/refine-logs/`): - `output/refine-logs/FINAL_PROPOSAL.md` - `output/refine-logs/REFINE_REPORT.md` - `output/refine-logs/SCORE_HISTORY.md` - `output/refine-logs/REFINE_STATE.json` and per-round files Experiment stage (written by `experiment-design`, under `output/`): - `output/EXPERIMENT_PLAN.md` - `output/EXPERIMENT_TRACKER.md` Pipeline integration (written by this skill): - `output/EXP_PIPELINE_SUMMARY.md` Do not move the subskill outputs into a different directory; downstream skills (`deploy-experiment`, `result-to-claim`, etc.) expect these canonical paths. ## Workflow ### Phase 0: Triage the Starting Point At the very start, read the most relevant existing files if they exist (use `Read`; skip silently if missing). This mirrors the file-reading step at the start of each subskill so the pipeline never re-does stable work: - `CLAUDE.md` — project dashboard, control flags, canonical output paths - `handoff.json` — if present, check `pipeline.stage` and `recovery.resume_skill` - `output/LIT_REVIEW_REPORT.md` — consolidated literature review - `output/IDEA_REPORT.md` — ranked idea candidates - `output/refine-logs/FINAL_PROPOSAL.md` — prior refined proposal, if any - `output/refine-logs/REFINE_REPORT.md` — prior refinement history - `output/refine-logs/REFINE_STATE.json` — checkpoint for an in-progress refinement - `output/EXPERIMENT_PLAN.md` — prior experiment plan, if any From these plus `$ARGUMENTS`, extract the problem, rough approach, constraints, resources, and target venue. Then decide: - If `FINAL_PROPOSAL.md` is missing, stale, or materially different from the current request, run the full `refine-research` stage. - If a `REFINE_STATE.json` exists with `status: in_progress`, resume `refine-research` from that checkpoint rather than restarting. - If the proposal is already strong and aligned, reuse it and jump to experiment planning. - If in doubt, prefer re-running `refine-research` rather than planning experiments for the wrong method. ### Phase 1: Method Refinement Stage Run the `refine-research` workflow and keep its four guiding principles intact: - do not lose the original problem — freeze the Problem Anchor - prefer the smallest adequate mechanism - one paper, one dominant contribution (plus at most one supporting contribution) - modern leverage (LLM / VLM / Diffusion / RL / distillation / inference-time scaling) is a prior, not a decoration Respect the subskill's constants: `MAX_ROUNDS = 5`, `SCORE_THRESHOLD = 9`, `MAX_PRIMARY_CLAIMS = 2`, `MAX_CORE_EXPERIMENTS = 3`, `MAX_NEW_TRAINABLE_COMPONENTS = 2`. Do not override these from the pipeline unless the user asks. Exit this stage only when these are explicit in `FINAL_PROPOSAL.md`: - the final method thesis - the dominant contribution - the complexity intentionally rejected - the key claims and must-run ablations - the remaining risks, if any - a fully populated **Experiment Design Handoff** section (hard gate inherited from `refine-research`) If the verdict is still `REVISE`, continue into experiment planning only if the remaining weaknesses are clearly documented in `REFINE_REPORT.md`. ### Phase 2: Planning Gate Before the experiment stage, write a short gate check: - What is the final method thesis? - What is the dominant contribution? - What complexity was intentionally rejected? - Which reviewer concerns still matter for validation? - Is a frontier primitive central, optional, or absent? If these answers are not crisp, tighten the final proposal first. ### Phase 3: Experiment Planning Stage Run the `experiment-design` workflow grounded in the refinement outputs (which that skill also reads in its own Phase 0): - `output/refine-logs/FINAL_PROPOSAL.md` - `output/refine-logs/REFINE_REPORT.md` - `output/LIT_REVIEW_REPORT.md` and `output/IDEA_REPORT.md` if present Respect the subskill's constants: `MAX_PRIMARY_CLAIMS = 2`, `MAX_CORE_BLOCKS = 5`, `MAX_BASELINE_FAMILIES = 3`, `DEFAULT_SEEDS = 3`. Ensure the experiment plan covers: - the main anchor result - novelty isolation - a simplicity / elegance (deletion) check - a frontier necessity check if a modern primitive is central (skip explicitly otherwise) - a failure analysis or qualitative diagnosis block - run order, compute / data budget, and decision gates - separation of must-run vs nice-to-have The subskill writes `output/EXPERIMENT_PLAN.md` and `output/EXPERIMENT_TRACKER.md`. Do not rewrite those files from this pipeline — only read them to build the summary. ### Phase 4: Integration Summary Write `output/EXP_PIPELINE_SUMMARY.md`: ```markdown # Pipeline Summary **Problem**: [problem] **Final Method Thesis**: [one sentence] **Final Verdict**: [READY / REVISE / RETHINK] **Date**: [today] ## Final Deliverables - Proposal: `output/refine-logs/FINAL_PROPOSAL.md` - Refinement report: `output/refine-logs/REFINE_REPORT.md` - Score history: `output/refine-logs/SCORE_HISTORY.md` - Experiment plan: `output/EXPERIMENT_PLAN.md` - Experiment tracker: `output/EXPERIMENT_TRACKER.md` ## Contribution Snapshot - Dominant contribution: - Optional supporting contribution: - Explicitly rejected complexity: ## Must-Prove Claims - [Claim 1] - [Claim 2] ## First Runs to Launch 1. [Run] 2. [Run] 3. [Run] ## Main Risks - [Risk]: - [Mitigation]: ## Next Action - Proceed to `/deploy-experiment` ``` ### Phase 5: Present a Brief Summary to the User ``` Pipeline complete. Method output: - output/refine-logs/FINAL_PROPOSAL.md - output/refine-logs/REFINE_REPORT.md Experiment output: - output/EXPERIMENT_PLAN.md - output/EXPERIMENT_TRACKER.md Pipeline summary: - output/EXP_PIPELINE_SUMMARY.md Best next step: - /deploy-experiment ``` ## Key Rules - **Large file handling**: If the Write tool fails due to file size, immediately retry using Bash (`cat << 'EOF' > file`) to write in chunks. Do NOT ask the user for permission — just do it silently. - Do not let the experiment plan override the Problem Anchor. - Do not widen the paper story after method refinement unless a missing validation block is truly necessary. - Reuse the same claims across `FINAL_PROPOSAL.md`, `EXPERIMENT_PLAN.md`, and `EXP_PIPELINE_SUMMARY.md`. - Keep the main paper story compact. - If the method is intentionally simple, defend that simplicity in the experiment plan rather than adding new components. - If the method uses a modern LLM / VLM / Diffusion / RL primitive, make its necessity test explicit. - If the method does not need a frontier primitive, say that clearly and avoid forcing one. - Do not relocate subskill outputs. Refinement artifacts belong under `output/refine-logs/`; experiment artifacts belong directly under `output/`. - Prefer the staged skills when the user only needs one stage; use this skill for the integrated flow. ## Composing with Other Skills ``` /experiment-design-pipeline -> one-shot method + experiment planning /refine-research -> method refinement only /experiment-design -> experiment planning only /deploy-experiment -> execution ```
More from GRIND-Lab-Core/night_owl_research_agent
- data-downloadDiscover, evaluate, and download publicly available datasets from the internet. Infers data needs from a research question or task, selects authoritative sources, downloads reproducibly, validates file integrity, and documents provenance. Pauses for user input when authentication, API keys, or major tradeoffs require a decision. Use when user says "download data", "get data", "find a dataset", "I need boundary files", "download census data", or needs any external dataset for analysis.
- deploy-experimentDeploy and run experiments for ML/DL training (local, remote, or Modal GPU) AND spatial data science / GIScience experiments (local, data-driven). Reads from output/refine-logs/EXPERIMENT_PLAN.md and output/refine-logs/FINAL_PROPOSAL.md, writes to output/experiment/. Use when user says "run experiment", "deploy experiment", "execute experiment plan", or needs to launch training / spatial analysis jobs.
- full-pipelineComplete 4-stage end-to-end research pipeline. Orchestrates idea-discovery-pipeline → deploy-experiment → auto-review-loop → generate-report. Reads RESEARCH_PLAN.md (or BRIEF.md as fallback) for context that overrides $ARGUMENTS.
- generate-ideaGenerate and rank research ideas given a broad direction. Use when user input "brainstorm ideas", "generate research ideas", "what can we work on", or wants to explore a research area for publishable directions.
- idea-discovery-pipelineThe full pipeline for idea generation. It generates 8-12 novel research ideas from literature gaps and evaluates each on novelty, feasibility, and domain fit. Orchestrates lit-review → generate-idea → novelty-check → idea-review → experiment-design-pipeline to go from a broad research direction to a validated, pilot-tested idea with a refined proposal and experiment plan. Produces output/IDEA_REPORT.md plus refinement and experiment artifacts.
- lit-reviewRetrieves papers from local folder or ArXiv and Semantic Scholar using domain-aware keyword expansion, builds synthesis matrix, identifies gaps. Calls tools/arxiv_fetch.py and tools/semantic_scholar_fetch.py. Writes to output/paper-cache/ and output/LIT_REVIEW_REPORT.md.
- paper-covertConverts the final Markdown manuscript from `paper-draft` / `paper-review-loop` into a submission package for the target venue — modular LaTeX (one file per section), compiled PDF, and Word `.docx`. Venue is read from `output/PAPER_PLAN.md` (or argument) and routed through a small YAML profile. Does not rewrite prose, score, or invent citations.
- paper-draftTransforms output/PAPER_PLAN.md into a journal-quality Markdown manuscript draft for GIScience, GeoAI, spatial data science, and remote sensing venues (IJGIS, ISPRS JPRS, RSE, TGIS, AAG Annals). Consults referenced literature, experiment, figure, and claim artifacts; supports full drafts, partial drafts, and skeleton drafts depending on readiness. Never fabricates results, metrics, or citations — produces a claim-to-evidence map and coverage-gap report alongside the manuscript.
- paper-figure-generateGenerates publication-quality figures and diagrams from output/PAPER_PLAN.md for GIScience, GeoAI, and remote sensing journals (IJGIS, ISPRS JPRS, RSE, TGIS). Decides per-figure whether to produce reproducible code-generated plots/maps or structured prompts for external image-generation models (nano banana, ChatGPT image). Produces figure files, source scripts, captions, manifest, and prompt artifacts. Never fabricates results — uses only evidence from project files.
- paper-review-loopReviews the manuscript produced by `paper-draft` (in `output/manuscript/`) as a demanding IJGIS / ISPRS JPRS reviewer-editor, cross-checks it against `output/PAPER_PLAN.md` and its evidence artifacts, then revises it into a stronger draft. Produces a reviewed manuscript, a revised manuscript, a structured review report, a prioritized issue log, a revision log, claim-risk notes, journal-fit notes, and next-loop priorities. Supports full, section-scoped, and mode-scoped review (structural / argument / novelty / methods / results-discussion / journal-fit / language / integrated). Safe on partial or skeletal drafts. Never fabricates results, citations, or figures.