full-pipeline
$
npx mdskill add GRIND-Lab-Core/night_owl_research_agent/full-pipelineEnd-to-end research pipeline. You are the pipeline conductor — you sequence the four sub-skills, check gates, and persist state across sessions.
SKILL.md
.github/skills/full-pipelineView on GitHub ↗
--- name: full-pipeline description: Complete 4-stage end-to-end research pipeline. Orchestrates idea-discovery-pipeline → deploy-experiment → auto-review-loop → generate-report. Reads RESEARCH_PLAN.md (or BRIEF.md as fallback) for context that overrides $ARGUMENTS. tools: all argument-hint: [research-idea-description] flags: COMPACT_MODE: false # If true, read output/PROJ_NOTES.md instead of full logs to save context --- # Skill: full-pipeline End-to-end research pipeline. You are the pipeline conductor — you sequence the four sub-skills, check gates, and persist state across sessions. The pipeline orchestrates exactly **four subskills**, in order: ``` idea-discovery-pipeline → deploy-experiment → auto-review-loop → generate-report ``` --- ## Constants - **AUTO_PROCEED = true** — When `true`, Gate 1 auto-selects the top-ranked idea (highest pilot signal + novelty confirmed) and continues to implementation. When `false`, always waits for explicit user confirmation before proceeding. Passed through to `/idea-discovery-pipeline`. - **ARXIV_DOWNLOAD = false** — When `true`, `/lit-review` (inside `/idea-discovery-pipeline`) downloads the top relevant arXiv PDFs. When `false` (default), only fetches metadata via arXiv API. Passed through to `/idea-discovery-pipeline`. - **HUMAN_CHECKPOINT = false** — When `true`, the auto-review loop (Stage 3) pauses after each round's review to let you see the score and provide custom modification instructions before fixes are implemented. When `false` (default), the loop runs fully autonomously. Passed through to `/auto-review-loop`. - **REVIEWER_DIFFICULTY = medium** — How adversarial the reviewer is. `medium` (default): standard MCP review. `hard`: adds reviewer memory + debate protocol. `nightmare`: GPT reads the repo directly via `codex exec` + memory + debate. Passed through to `/auto-review-loop`. - **COMPACT_MODE = false** — When `true`, prefer compact summaries (`output/PROJ_NOTES.md`, `output/experiment/EXPERIMENT_RESULT.md`) over full logs (`output/experiment/EXPERIMENT_LOG.md`, `output/AUTO_REVIEW_REPORT.md`) on session recovery. Passed through to `/auto-review-loop` and `/generate-report`. > Override via argument, e.g., `/full-pipeline "topic" — AUTO_PROCEED: false, human checkpoint: true, difficulty: nightmare`. --- ## Startup: Load Context, Check & Resume Before any subskill is invoked, load the project context. **All discovered files override `$ARGUMENTS`**; `$ARGUMENTS` is only the primary driver when none of the context files exist or carry usable information. 1. Read `CLAUDE.md` — project dashboard, control flags, canonical output paths. 2. Read `handoff.json` if it exists — `pipeline.stage`, `recovery.resume_skill`, `recovery.read_first`. 3. Read `memory/MEMORY.md` for the current pipeline stage and prior token usage. 4. **Load research context (priority order):** 1. **`RESEARCH_PLAN.md`** in the project root — if present and non-empty, this is the authoritative research brief (problem, method, constraints, success criteria, prior attempts). Treat it as ground truth. 2. **`BRIEF.md`** in the project root — if `RESEARCH_PLAN.md` is missing, read `BRIEF.md` (the 12-section research brief — see `CLAUDE.md` Key Files). Treat it as the authoritative brief. 3. **`$ARGUMENTS`** — if neither file exists, or both are empty / contain only template placeholders, fall back to `$ARGUMENTS` as the research direction. 4. **Both file and `$ARGUMENTS` present**: the file's details (problem, constraints, success criteria) override; `$ARGUMENTS` may sharpen scope but cannot contradict the brief. If they conflict materially, surface the conflict to the user and ask which one to follow before proceeding. 5. Read `output/REVIEW_STATE.json` if it exists — resume from saved round (see `/auto-review-loop` resume rules). 6. Read the experiment status file appropriate for `COMPACT_MODE`: - `COMPACT_MODE = true` → `output/PROJ_NOTES.md` and `output/experiment/EXPERIMENT_RESULT.md` - `COMPACT_MODE = false` → `output/experiment/EXPERIMENT_LOG.md` and `output/experiment/EXPERIMENT_RESULT.md` 7. Display the resolved context (which brief was loaded, current stage, last action) and confirm with the user before proceeding. > 💡 If neither `RESEARCH_PLAN.md` nor `BRIEF.md` exists and `$ARGUMENTS` is also empty, stop and ask the user for a research direction or to create one of these files (templates: `templates/RESEARCH_PLAN_TEMPLATE.md`). --- ## Stage 1 — Idea Discovery (`/idea-discovery-pipeline`) **Goal**: Identify a tractable, novel research direction and produce an experiment plan. Invoke the subskill, passing through control flags: ``` /idea-discovery-pipeline "$ARGUMENTS" ``` The subskill internally chains: `/lit-review` → `/generate-idea` → `/novelty-check` → `/idea-review` → `/experiment-design-pipeline` (which itself runs `/refine-research` → `/experiment-design`). It will read `RESEARCH_PLAN.md` directly in its Phase 0 — do not re-paste its contents into the prompt. **Outputs (canonical paths produced by `/idea-discovery-pipeline`):** - `output/LIT_REVIEW_REPORT.md` — literature landscape and ranked gaps - `output/IDEA_REPORT.md` — ranked, novelty-checked, pilot-tested ideas (top-level report) - `output/NOVELTY_REPORT.md` — deep novelty verdicts - `output/IDEA_REVIEW_REPORT.md` — external critical review - `output/refine-logs/FINAL_PROPOSAL.md` + `output/refine-logs/REFINE_REPORT.md` — refined proposal - `output/EXPERIMENT_PLAN.md` + `output/EXPERIMENT_TRACKER.md` — experiment roadmap - `output/EXP_PIPELINE_SUMMARY.md` — pipeline summary **Gate 1**: If `AUTO_PROCEED = false`, stop here and present the top ideas to the user: ``` 📋 Idea Discovery Pipeline completed! Top ideas: 1. [Idea 1 title] — Pilot: POSITIVE (+X%), Novelty: 9/10 2. [Idea 2 title] — Pilot: WEAK POSITIVE (+Y%), Novelty: 8.3/10 3. [Idea 3 title] — Pilot: NEGATIVE, eliminated Recommended: Idea 1. Shall I proceed with deployment? ``` The user may approve, pick a different idea, request changes (re-run the subskill with refined constraints), reject all (collect feedback, re-run), or stop and save state. If `AUTO_PROCEED = true`, the subskill auto-selects the top idea after presenting results. After selection, ensure the chosen idea and supporting context are recorded in `RESEARCH_PLAN.md` (template: `templates/RESEARCH_PLAN_TEMPLATE.md`) so downstream stages and future sessions inherit it. **Path bridge before Stage 2**: `/deploy-experiment` reads `output/refine-logs/EXPERIMENT_PLAN.md` and `output/refine-logs/FINAL_PROPOSAL.md`. `/idea-discovery-pipeline` writes the plan to `output/EXPERIMENT_PLAN.md`. Before Stage 2, verify `output/refine-logs/EXPERIMENT_PLAN.md` exists; if it does not, copy `output/EXPERIMENT_PLAN.md` to that location so the deploy skill finds it. Do not relocate the original. Update `memory/MEMORY.md`. --- ## Stage 2 — Autonomous Experiment Execution (`/deploy-experiment`) **Goal**: Run all experiments described in the plan and collect results. Invoke the subskill: ``` /deploy-experiment ``` The subskill will: 1. Read `output/refine-logs/EXPERIMENT_PLAN.md` (authoritative plan) and `output/refine-logs/FINAL_PROPOSAL.md` (context). 2. **Run the mandatory local GPU availability check (Step 0 of `/deploy-experiment`)** before any classification. This applies to every pilot or full experiment in this pipeline. If a local GPU (CUDA or MPS) is detected, all ML/DL runs MUST execute on it; only when `LOCAL_GPU=none` may runs route to remote/Modal/CPU. 3. Classify the experiment into Track A (ML/DL on GPU) or Track B (spatial/GIScience on CPU) — or both for mixed GeoAI. 4. Acquire data via `/data-download` if any required dataset is missing from `data/raw/`. 5. Pre-flight environment checks (GPU re-confirmed from Step 0 for Track A, package availability for Track B). 6. Launch runs (long-running jobs go to background or remote screen sessions). 6. For Track B, route execution through `/spatial-analysis` per claim. 7. Periodically invoke `/training-check` (Track A with W&B) to monitor for stalls/failures. **Outputs (canonical paths produced by `/deploy-experiment`):** - `output/experiment/EXPERIMENT_RESULT.md` — final per-claim results, tables, key numbers - `output/experiment/EXPERIMENT_LOG.md` — chronological run log (commands, timings, failures) - `output/experiment/data/` — visualization-ready artifacts (CSV, GeoPackage, predictions, metrics, sidecar metadata) - `output/experiment/figures/` — quick-look figures - `output/experiment/scripts/` — runner scripts created during execution Wait for runs to finish, then confirm `EXPERIMENT_RESULT.md` and `EXPERIMENT_LOG.md` are populated before advancing. Update `memory/MEMORY.md`. --- ## Stage 3 — Autonomous Review Loop (`/auto-review-loop`) **Goal**: Iteratively improve work quality through adversarial review until all per-criterion floors are met and the weighted score is acceptable, or `MAX_ROUNDS = 4` is reached. Invoke the subskill, passing the chosen idea and difficulty: ``` /auto-review-loop "[chosen idea title] — difficulty: $REVIEWER_DIFFICULTY" ``` The subskill reads `output/experiment/EXPERIMENT_RESULT.md`, `output/experiment/EXPERIMENT_LOG.md`, `output/refine-logs/FINAL_PROPOSAL.md`, and prior review state. Each round runs: independent reviewer (Codex MCP / `codex exec` / Claude subagent fallback) → parse → (optional debate in `hard`/`nightmare`) → fix verification → re-evaluate. State is persisted to `output/REVIEW_STATE.json` at every phase boundary. **Outputs (canonical paths produced by `/auto-review-loop`):** - `output/AUTO_REVIEW_REPORT.md` — cumulative review log (one entry per round) - `output/review-rounds/round_<N>_raw.md` — verbatim raw reviewer responses - `output/REVIEW_STATE.json` — per-round scores, open weaknesses, threadId, status - `output/METHOD_DESCRIPTION.md` — concise final method description (consumed by `/paper-figure-generate` and `/paper-draft`) - `memory/REVIEWER_MEMORY.md` (`hard` and `nightmare` only) — persistent reviewer memory across rounds - `output/PROJ_NOTES.md` (when `COMPACT_MODE = true`) — appended one-line discoveries per round After the loop completes, read the terminal `status` and final score from `output/REVIEW_STATE.json`: - `status = "completed"` (POSITIVE_THRESHOLD met) → proceed to Stage 4. - `status = "max_rounds_reached"` → proceed to Stage 4 with the remaining `open_weaknesses` flagged as limitations in the narrative. - `status = "blocked"` (Phase B.7 circuit breaker, contract violation, or no reviewer backend) → STOP. Surface `output/CONTRACT_VIOLATION.md` (if any) and the blocker reason. Do **not** proceed to `/generate-report` until a human resolves it. Update `memory/MEMORY.md`. --- ## Stage 4 — Report Writing (`/generate-report`) **Goal**: Consolidate all pipeline artifacts into a single narrative rich enough to drive the downstream `paper-writing-pipeline` (paper-plan → paper-figure-generate → paper-draft → paper-review-loop → paper-convert). Invoke the subskill (optionally with a focus hint): ``` /generate-report ``` The subskill reads, at minimum: - `output/LIT_REVIEW_REPORT.md` - `output/IDEA_REPORT.md` and `RESEARCH_PLAN.md` (or `BRIEF.md` if that was the active brief) - `output/refine-logs/FINAL_PROPOSAL.md` - `output/refine-logs/EXPERIMENT_PLAN.md` - `output/experiment/EXPERIMENT_RESULT.md` and `output/experiment/EXPERIMENT_LOG.md` - `output/AUTO_REVIEW_REPORT.md` and `output/REVIEW_STATE.json` - `output/METHOD_DESCRIPTION.md` - `output/PROJ_NOTES.md` (when `COMPACT_MODE = true`) - `data/DATA_MANIFEST.md`, `output/spatial-analysis/`, `output/figures/` (as needed) …and produces `output/NARRATIVE_REPORT.md` — the single source-of-truth narrative for the paper-writing pipeline. After `/generate-report` returns, append a short pipeline summary block to `output/NARRATIVE_REPORT.md` (or write it alongside as `output/PIPELINE_SUMMARY.md`): ```markdown # Research Pipeline Report **Direction**: $ARGUMENTS (and/or RESEARCH_PLAN.md / BRIEF.md) **Chosen Idea**: [title] **Date**: [start] → [end] **Pipeline**: idea-discovery-pipeline → deploy-experiment → auto-review-loop → generate-report ## Journey Summary - Ideas generated: X → filtered to Y → piloted Z → chose 1 - Implementation: [brief description of what was built] - Experiments: [number of runs, total compute time, GPU-hours] - Review rounds: N/4, final score: X/10, status: [completed|max_rounds_reached|blocked] ## Final Status - [ ] Ready for paper-writing-pipeline / [ ] Needs manual follow-up ## Remaining TODOs (if any) - [items from output/REVIEW_STATE.json open_weaknesses] ## Files Changed - [list of key files created/modified] ``` **Gate 4** (if `HUMAN_CHECKPOINT = true`): Present `output/NARRATIVE_REPORT.md` to the user for approval before handing off. **Handoff**: Once approved, the next call in the workflow is `/paper-writing-pipeline` using `output/NARRATIVE_REPORT.md` as input. Update `memory/MEMORY.md`. --- ## Recovery Instructions If context overflows mid-pipeline: 1. Read `handoff.json` for `pipeline.stage`, `recovery.resume_skill`, `recovery.read_first` (fastest recovery). 2. Read `output/REVIEW_STATE.json` to find current review round and per-criterion scores (Stage 3 only). 3. Read `memory/MEMORY.md` for pipeline state flags. 4. Read `output/PROJ_NOTES.md` (when `COMPACT_MODE = true`) or `output/experiment/EXPERIMENT_LOG.md` (full). 5. Resume from the interrupted stage — do **NOT** re-run completed stages or experiments marked SUCCESS in `output/experiment/EXPERIMENT_LOG.md`. --- ## Key Rules - **Large file handling**: If the Write tool fails due to file size, immediately retry using Bash (`cat << 'EOF' > file`) to write in chunks. Do NOT ask the user for permission — just do it silently. - **Brief precedence**: `RESEARCH_PLAN.md` > `BRIEF.md` > `$ARGUMENTS`. The first one with usable content wins; the others provide supplementary scope only. - **Mandatory local GPU check before any pilot or full experiment.** Stage 1 (`/idea-discovery-pipeline` → `/generate-idea` Phase 5.0) and Stage 2 (`/deploy-experiment` Step 0) MUST each run the local GPU presence check (`nvidia-smi`, then MPS) before launching. If a local GPU is detected, all ML/DL runs MUST execute on it; remote/Modal/CPU is only allowed when `LOCAL_GPU=none`. - **Gate 1 is controlled by `AUTO_PROCEED`.** When `false`, do not proceed past Stage 1 without user confirmation. When `true`, the subskill auto-selects the top idea after presenting results. - **Stages 2–4 can run autonomously** once the user confirms the idea. This is the "sleep and wake up to results" part. - **Respect the canonical output paths** above — do NOT relocate subskill outputs. Downstream skills (and the paper-writing pipeline) read them at fixed locations. - **If Stage 3 ends `blocked`**, stop and surface the blocker. Do not proceed to `/generate-report` or paper writing. - **If Stage 3 ends `max_rounds_reached`**, proceed to Stage 4 but flag remaining `open_weaknesses` as limitations. - **Budget awareness**: Track total GPU-hours across the pipeline. Flag if approaching user-defined limits. - **Documentation**: Every stage updates its own canonical output files. The full history must be self-contained for the next session to recover. - **Generator-evaluator separation** (CLAUDE.md): never let the entity that wrote a section score it. `/auto-review-loop` enforces this — do not bypass it. - **Fail gracefully**: If any stage fails (no good ideas, experiments crash, review loop blocked), report clearly and suggest alternatives rather than forcing forward.
More from GRIND-Lab-Core/night_owl_research_agent
- data-downloadDiscover, evaluate, and download publicly available datasets from the internet. Infers data needs from a research question or task, selects authoritative sources, downloads reproducibly, validates file integrity, and documents provenance. Pauses for user input when authentication, API keys, or major tradeoffs require a decision. Use when user says "download data", "get data", "find a dataset", "I need boundary files", "download census data", or needs any external dataset for analysis.
- deploy-experimentDeploy and run experiments for ML/DL training (local, remote, or Modal GPU) AND spatial data science / GIScience experiments (local, data-driven). Reads from output/refine-logs/EXPERIMENT_PLAN.md and output/refine-logs/FINAL_PROPOSAL.md, writes to output/experiment/. Use when user says "run experiment", "deploy experiment", "execute experiment plan", or needs to launch training / spatial analysis jobs.
- experiment-design-pipelineRun an end-to-end workflow that chains the skills `refine-research` and `experiment-design`. Use when the user wants a one-shot pipeline from vague research direction to focused final proposal plus detailed experiment roadmap, or asks to build a pipeline, do it end-to-end, or generate both the method and experiment plan together.
- generate-ideaGenerate and rank research ideas given a broad direction. Use when user input "brainstorm ideas", "generate research ideas", "what can we work on", or wants to explore a research area for publishable directions.
- idea-discovery-pipelineThe full pipeline for idea generation. It generates 8-12 novel research ideas from literature gaps and evaluates each on novelty, feasibility, and domain fit. Orchestrates lit-review → generate-idea → novelty-check → idea-review → experiment-design-pipeline to go from a broad research direction to a validated, pilot-tested idea with a refined proposal and experiment plan. Produces output/IDEA_REPORT.md plus refinement and experiment artifacts.
- lit-reviewRetrieves papers from local folder or ArXiv and Semantic Scholar using domain-aware keyword expansion, builds synthesis matrix, identifies gaps. Calls tools/arxiv_fetch.py and tools/semantic_scholar_fetch.py. Writes to output/paper-cache/ and output/LIT_REVIEW_REPORT.md.
- paper-covertConverts the final Markdown manuscript from `paper-draft` / `paper-review-loop` into a submission package for the target venue — modular LaTeX (one file per section), compiled PDF, and Word `.docx`. Venue is read from `output/PAPER_PLAN.md` (or argument) and routed through a small YAML profile. Does not rewrite prose, score, or invent citations.
- paper-draftTransforms output/PAPER_PLAN.md into a journal-quality Markdown manuscript draft for GIScience, GeoAI, spatial data science, and remote sensing venues (IJGIS, ISPRS JPRS, RSE, TGIS, AAG Annals). Consults referenced literature, experiment, figure, and claim artifacts; supports full drafts, partial drafts, and skeleton drafts depending on readiness. Never fabricates results, metrics, or citations — produces a claim-to-evidence map and coverage-gap report alongside the manuscript.
- paper-figure-generateGenerates publication-quality figures and diagrams from output/PAPER_PLAN.md for GIScience, GeoAI, and remote sensing journals (IJGIS, ISPRS JPRS, RSE, TGIS). Decides per-figure whether to produce reproducible code-generated plots/maps or structured prompts for external image-generation models (nano banana, ChatGPT image). Produces figure files, source scripts, captions, manifest, and prompt artifacts. Never fabricates results — uses only evidence from project files.
- paper-review-loopReviews the manuscript produced by `paper-draft` (in `output/manuscript/`) as a demanding IJGIS / ISPRS JPRS reviewer-editor, cross-checks it against `output/PAPER_PLAN.md` and its evidence artifacts, then revises it into a stronger draft. Produces a reviewed manuscript, a revised manuscript, a structured review report, a prioritized issue log, a revision log, claim-risk notes, journal-fit notes, and next-loop priorities. Supports full, section-scoped, and mode-scoped review (structural / argument / novelty / methods / results-discussion / journal-fit / language / integrated). Safe on partial or skeletal drafts. Never fabricates results, citations, or figures.