lit-review
$
npx mdskill add GRIND-Lab-Core/night_owl_research_agent/lit-reviewYou perform a targeted literature review for $ARGUMENTS. Your outputs feed idea discovery and paper writing. This skill checks multiple sources **in priority order**. All are optional — if a source is not configured or not requested, skip it silently.
SKILL.md
.github/skills/lit-reviewView on GitHub ↗
--- name: lit-review description: Retrieves papers from local folder or ArXiv and Semantic Scholar using domain-aware keyword expansion, builds synthesis matrix, identifies gaps. Calls tools/arxiv_fetch.py and tools/semantic_scholar_fetch.py. Writes to output/paper-cache/ and output/LIT_REVIEW_REPORT.md. argument-hint: [paper-topic-or-url] tools: Bash, Read, Write, WebFetch, WebSearch, Agent, Glob, Grep, --- # Skill: lit-review You perform a targeted literature review for $ARGUMENTS. Your outputs feed idea discovery and paper writing. This skill checks multiple sources **in priority order**. All are optional — if a source is not configured or not requested, skip it silently. Research topic: $ARGUMENTS --- ## Constants - **PAPER_LIBRARY** — Local directory containing user's paper collection (PDFs). Check these paths in order: 1. `papers/` in the current project directory 2. Custom path specified by user in `CLAUDE.md` under `## Constants` - **MAX_LOCAL_PAPERS = 30** — Maximum number of local PDFs to scan (read first 3 pages each). If more are found, prioritize by filename relevance to the topic. - **ARXIV_DOWNLOAD = false** — When `true`, download top 3-5 most relevant arXiv PDFs to PAPER_LIBRARY after search. When `false` (default), only fetch metadata (title, abstract, authors) via arXiv API — no files are downloaded. - **ARXIV_MAX_DOWNLOAD = 5** — Maximum number of PDFs to download when `ARXIV_DOWNLOAD = true`. > 💡 Overrides: > - `/lit-review "topic" — paper library: ~/my_papers/` — custom local PDF path ### Source Selection Parse `$ARGUMENTS` for a `— sources:` directive: - **If `— sources:` is specified**: Only search the listed sources (comma-separated). Valid values: `zotero`, `obsidian`, `local`, `web`, `semantic-scholar`, `all`. - **If not specified**: Default to `all` — search every available source in priority order (`semantic-scholar` is **excluded** from `all`; it must be explicitly listed). Examples: ``` /lit-review "diffusion models" → all (default, no S2) /lit-review "diffusion models" — sources: all → all (default, no S2) /lit-review "diffusion models" — sources: zotero → Zotero only /lit-review "diffusion models" — sources: zotero, web → Zotero + web /lit-review "diffusion models" — sources: local → local PDFs only /lit-review "topic" — sources: obsidian, local, web → skip Zotero /lit-review "topic" — sources: web, semantic-scholar → web + S2 API (IEEE/ACM venue papers) /lit-review "topic" — sources: all, semantic-scholar → all + S2 API ``` ### Source Table | Priority | Source | ID | How to detect | What it provides | |----------|--------|----|---------------|-----------------| | 1 | **Zotero** (via MCP) | `zotero` | Try calling any `mcp__zotero__*` tool — if unavailable, skip | Collections, tags, annotations, PDF highlights, BibTeX, semantic search | | 2 | **Obsidian** (via MCP) | `obsidian` | Try calling any `mcp__obsidian-vault__*` tool — if unavailable, skip | Research notes, paper summaries, tagged references, wikilinks | | 3 | **Local PDFs** | `local` | `Glob: papers/**/*.pdf` | Raw PDF content (first 3 pages) | | 4 | **Web search** | `web` | Always available (WebSearch) | arXiv, Semantic Scholar, Google Scholar | | 5 | **Semantic Scholar API** | `semantic-scholar` | `tools/semantic_scholar_fetch.py` exists | Published venue papers (IEEE, ACM, Springer) with structured metadata: citation counts, venue info, TLDR. **Only runs when explicitly requested** via `— sources: semantic-scholar` or `— sources: web, semantic-scholar` | > **Graceful degradation**: If no MCP servers are configured, the skill works exactly as before (local PDFs + web search). Zotero and Obsidian are pure additions. ## Workflow ### Step 0a: Search Zotero Library (if available) **Skip this step entirely if Zotero MCP is not configured.** Try calling a Zotero MCP tool (e.g., search). If it succeeds: 1. **Search by topic**: Use the Zotero search tool to find papers matching the research topic 2. **Read collections**: Check if the user has a relevant collection/folder for this topic 3. **Extract annotations**: For highly relevant papers, pull PDF highlights and notes — these represent what the user found important 4. **Export BibTeX**: Get citation data for relevant papers (useful for `/paper-draft` later) 5. **Compile results**: For each relevant Zotero entry, extract: - Title, authors, year, venue - User's annotations/highlights (if any) - Tags the user assigned - Which collection it belongs to > 📚 Zotero annotations are gold — they show what the user personally highlighted as important, which is far more valuable than generic summaries. ### Step 0b: Search Obsidian Vault (if available) **Skip this step entirely if Obsidian MCP is not configured.** Try calling an Obsidian MCP tool (e.g., search). If it succeeds: 1. **Search vault**: Search for notes related to the research topic 2. **Check tags**: Look for notes tagged with relevant topics (e.g., `#diffusion-models`, `#paper-review`) 3. **Read research notes**: For relevant notes, extract the user's own summaries and insights 4. **Follow links**: If notes link to other relevant notes (wikilinks), follow them for additional context 5. **Compile results**: For each relevant note: - Note title and path - User's summary/insights - Links to other notes (research graph) - Any frontmatter metadata (paper URL, status, rating) > 📝 Obsidian notes represent the user's **processed understanding** — more valuable than raw paper content for understanding their perspective. ### Step 0c: Scan Local Paper Library Before searching online, check if the user already has relevant papers locally: 1. **Locate library**: Check **PAPER_LIBRARY** paths for PDF files ``` Glob: papers/**/*.pdf ``` 2. **De-duplicate against Zotero**: If Step 0a found papers, skip any local PDFs already covered by Zotero results (match by filename or title). 3. **Filter by relevance**: Match filenames and first-page content against the research topic. Skip clearly unrelated papers. 4. **Summarize relevant papers**: For each relevant local PDF (up to MAX_LOCAL_PAPERS): - Read first 3 pages (title, abstract, intro) - Extract: title, authors, year, core contribution, relevance to topic - Flag papers that are directly related vs tangentially related 5. **Build local knowledge base**: Compile summaries into a "papers you already have" section. This becomes the starting point — external search fills the gaps. > 📚 If no local papers are found, skip to Step 1. If the user has a comprehensive local collection, the external search can be more targeted (focus on what's missing). ### Step 1: Search (external) - Use WebSearch to find recent papers on the topic - Check arXiv, Semantic Scholar, Google Scholar - Focus on papers from last 3 years unless studying foundational work - **De-duplicate**: Skip papers already found in Zotero, Obsidian, or local library **arXiv API search** (always runs, no download by default): Locate the fetch script and search arXiv directly: ```bash # Try to find arxiv_fetch.py SCRIPT=$(find tools/ -name "arxiv_fetch.py" 2>/dev/null | head -1) # If not found [ -z "$SCRIPT" ] && SCRIPT=$(find ~/.claude/skills/arxiv/ -name "arxiv_fetch.py" 2>/dev/null | head -1) # Search arXiv API for structured results (title, abstract, authors, categories) python3 "$SCRIPT" search "QUERY" --max 10 ``` If `arxiv_fetch.py` is not found, fall back to WebSearch for arXiv (same as before). The arXiv API returns structured metadata (title, abstract, full author list, categories, dates) — richer than WebSearch snippets. Merge these results with WebSearch findings and de-duplicate. **Semantic Scholar API search** (only when `semantic-scholar` is in sources): When the user explicitly requests `— sources: semantic-scholar` (or `— sources: web, semantic-scholar`), search for published venue papers beyond arXiv: ```bash S2_SCRIPT=$(find tools/ -name "semantic_scholar_fetch.py" 2>/dev/null | head -1) [ -z "$S2_SCRIPT" ] && S2_SCRIPT=$(find ~/.claude/skills/semantic-scholar/ -name "semantic_scholar_fetch.py" 2>/dev/null | head -1) # Search for published papers with quality filters python3 "$S2_SCRIPT" search "QUERY" --max 10 \ --fields-of-study "GIScience, GeoAI" \ --publication-types "JournalArticle, Conference" ``` If `semantic_scholar_fetch.py` is not found, skip silently. **Why use Semantic Scholar?** Many IEEE/ACM journal papers are NOT on arXiv. S2 fills the gap for published venue-only papers with citation counts and venue metadata. **De-duplication between arXiv and S2**: Match by arXiv ID (S2 returns `externalIds.ArXiv`): - If a paper appears in both: check S2's `venue`/`publicationVenue` — if it has been published in a journal/conference (e.g. IEEE TWC, JSAC), use S2's metadata (venue, citationCount, DOI) as the authoritative version, since the published version supersedes the preprint. Keep the arXiv PDF link for download. - If the S2 match has no venue (still just a preprint indexed by S2): keep the arXiv version as-is. - S2 results without `externalIds.ArXiv` are **venue-only papers** not on arXiv — these are the unique value of this source. **Optional PDF download** (only when `ARXIV_DOWNLOAD = true`): After all sources are searched and papers are ranked by relevance: ```bash # Download top N most relevant arXiv papers python3 "$SCRIPT" download ARXIV_ID --dir papers/ ``` - Only download papers ranked in the top ARXIV_MAX_DOWNLOAD by relevance - Skip papers already in the local library - 1-second delay between downloads (rate limiting) - Verify each PDF > 10 KB ### Step 2: Analyze Each Paper For each relevant paper (from all sources), extract: - **Problem**: What gap does it address? - **Method**: Core technical contribution (1-2 sentences) - **Results**: Key numbers/claims - **Relevance**: How does it relate to our work? - **Source**: Where we found it (Zotero/Obsidian/local/web) — helps user know what they already have vs what's new ### Step 3: Synthesize - Group papers by approach/theme - Identify consensus vs disagreements in the field - Find gaps that our work could fill - If Obsidian notes exist, incorporate the user's own insights into the synthesis Read `skills/knowledge/synthesis-analyst.md` for the synthesis protocol. Ignore if `skills/knowledge/synthesis-analyst.md` not found. then: 1. Group papers by theme (not chronology). 2. Build a synthesis matrix: paper × (method, dataset, key metric, finding, limitation). 3. Identify consensus views, contradictions, and geographic biases. 4. Write synthesis to the **Synthesis** section of `output/LIT_REVIEW_REPORT.md`. Find gaps that our work can fill. For each gap dimension (Methodological, Geographic, Temporal, Data, Equity, Validation): - Score gap: Novelty × 0.4 + Feasibility × 0.35 + Impact × 0.25 - Rank top 5 gaps - Write to the **Gap Analysis** section of `output/LIT_REVIEW_REPORT.md` ### Step 4: Output Present as a structured literature table: ``` | Paper | Venue | Method | Key Result | Relevance to Us | Source | |-------|-------|--------|------------|-----------------|--------| ``` Plus a narrative summary of the landscape (3-5 paragraphs). If Zotero BibTeX was exported, include a `references.bib` snippet for direct use in paper writing. ### Step 5: Save (if requested) - Save paper PDFs to `papers/` - Update related work notes in project memory - **Update** `memory/MEMORY.md` - If Obsidian is available, optionally create a literature review note in the vault ## Outputs - `output/paper-cache/` — JSON paper metadata files - `output/LIT_REVIEW_REPORT.md` — single consolidated report containing: 1. **Findings** — one-line summaries of key discoveries (appended across runs; dated entries) 2. **Synthesis (YYYY-MM-DD)** — thematic synthesis with paper × (method, dataset, metric, finding, limitation) matrix 3. **Gap Analysis** — ranked top-5 gaps across Methodological, Geographic, Temporal, Data, Equity, Validation dimensions > When re-running the skill, **append** new dated Synthesis and Gap Analysis sections rather than overwriting — preserves history. Findings entries are always appended. ## Key Rules - Always include paper citations (authors, year, venue) - Distinguish between peer-reviewed and preprints - Be honest about limitations of each paper - Note if a paper directly competes with or supports our approach - **Never fail because a MCP server is not configured** — always fall back gracefully to the next data source - Zotero/Obsidian tools may have different names depending on how the user configured the MCP server (e.g., `mcp__zotero__search` or `mcp__zotero-mcp__search_items`). Try the most common patterns and adapt.
More from GRIND-Lab-Core/night_owl_research_agent
- data-downloadDiscover, evaluate, and download publicly available datasets from the internet. Infers data needs from a research question or task, selects authoritative sources, downloads reproducibly, validates file integrity, and documents provenance. Pauses for user input when authentication, API keys, or major tradeoffs require a decision. Use when user says "download data", "get data", "find a dataset", "I need boundary files", "download census data", or needs any external dataset for analysis.
- deploy-experimentDeploy and run experiments for ML/DL training (local, remote, or Modal GPU) AND spatial data science / GIScience experiments (local, data-driven). Reads from output/refine-logs/EXPERIMENT_PLAN.md and output/refine-logs/FINAL_PROPOSAL.md, writes to output/experiment/. Use when user says "run experiment", "deploy experiment", "execute experiment plan", or needs to launch training / spatial analysis jobs.
- experiment-design-pipelineRun an end-to-end workflow that chains the skills `refine-research` and `experiment-design`. Use when the user wants a one-shot pipeline from vague research direction to focused final proposal plus detailed experiment roadmap, or asks to build a pipeline, do it end-to-end, or generate both the method and experiment plan together.
- full-pipelineComplete 4-stage end-to-end research pipeline. Orchestrates idea-discovery-pipeline → deploy-experiment → auto-review-loop → generate-report. Reads RESEARCH_PLAN.md (or BRIEF.md as fallback) for context that overrides $ARGUMENTS.
- generate-ideaGenerate and rank research ideas given a broad direction. Use when user input "brainstorm ideas", "generate research ideas", "what can we work on", or wants to explore a research area for publishable directions.
- idea-discovery-pipelineThe full pipeline for idea generation. It generates 8-12 novel research ideas from literature gaps and evaluates each on novelty, feasibility, and domain fit. Orchestrates lit-review → generate-idea → novelty-check → idea-review → experiment-design-pipeline to go from a broad research direction to a validated, pilot-tested idea with a refined proposal and experiment plan. Produces output/IDEA_REPORT.md plus refinement and experiment artifacts.
- paper-covertConverts the final Markdown manuscript from `paper-draft` / `paper-review-loop` into a submission package for the target venue — modular LaTeX (one file per section), compiled PDF, and Word `.docx`. Venue is read from `output/PAPER_PLAN.md` (or argument) and routed through a small YAML profile. Does not rewrite prose, score, or invent citations.
- paper-draftTransforms output/PAPER_PLAN.md into a journal-quality Markdown manuscript draft for GIScience, GeoAI, spatial data science, and remote sensing venues (IJGIS, ISPRS JPRS, RSE, TGIS, AAG Annals). Consults referenced literature, experiment, figure, and claim artifacts; supports full drafts, partial drafts, and skeleton drafts depending on readiness. Never fabricates results, metrics, or citations — produces a claim-to-evidence map and coverage-gap report alongside the manuscript.
- paper-figure-generateGenerates publication-quality figures and diagrams from output/PAPER_PLAN.md for GIScience, GeoAI, and remote sensing journals (IJGIS, ISPRS JPRS, RSE, TGIS). Decides per-figure whether to produce reproducible code-generated plots/maps or structured prompts for external image-generation models (nano banana, ChatGPT image). Produces figure files, source scripts, captions, manifest, and prompt artifacts. Never fabricates results — uses only evidence from project files.
- paper-review-loopReviews the manuscript produced by `paper-draft` (in `output/manuscript/`) as a demanding IJGIS / ISPRS JPRS reviewer-editor, cross-checks it against `output/PAPER_PLAN.md` and its evidence artifacts, then revises it into a stronger draft. Produces a reviewed manuscript, a revised manuscript, a structured review report, a prioritized issue log, a revision log, claim-risk notes, journal-fit notes, and next-loop priorities. Supports full, section-scoped, and mode-scoped review (structural / argument / novelty / methods / results-discussion / journal-fit / language / integrated). Safe on partial or skeletal drafts. Never fabricates results, citations, or figures.