litreview

$npx mdskill add alirezarezvani/claude-skills/litreview

> **Portability:** Requires a Consensus MCP connection, Node.js with `docx` package for document generation, and (in CLI) `bash_tool`. Works in Claude Code CLI natively. In Claude.ai with Consensus MCP + Code Execution, the workflow is supported.

SKILL.md

.github/skills/litreviewView on GitHub ↗
---
name: litreview
description: "Academic literature orientation skill that searches papers via Consensus, builds a strategic search plan using PICO (default) or SPIDER / Decomposition / hybrid as fallbacks, and synthesizes findings into a formatted Word (.docx) research guide. Grill-me intake (research question specificity + framework hint + tentative depth) before the recon search; a second forcing checkpoint after Phase 2 confirms framework + sub-areas + depth before searches consume budget. Configurable depth (5/10/20 queries) controls coverage vs. speed. Output is a 'launching pad' — an orientation guide that lets a researcher dive in confidently, not a finished review. Use when the user starts literature-oriented research (e.g., 'litreview on [topic]', 'literature review on [topic]', 'I'm starting a literature review on X', 'I'm writing a paper on X', 'help me research X', 'I'm doing research on X', 'can you help me research X'). Do NOT use for single one-off paper searches wanting a quick list — that's a plain Consensus search."
license: MIT
metadata:
  source_spec: "megaprompts/09-litreview-megaprompt.md"
  build_pattern: "Path B (direct conversion)"
  research_pack_convention: "Agent Integrity Rules verbatim per PR #657 audit; sibling of pulse"
  version: 1.0.0
---

# Litreview — Academic Literature Orientation

> **Portability:** Requires a Consensus MCP connection, Node.js with `docx` package for document generation, and (in CLI) `bash_tool`. Works in Claude Code CLI natively. In Claude.ai with Consensus MCP + Code Execution, the workflow is supported.

Produce a **launching pad** — not a finished literature review, but an orientation document that gives a researcher entering an unfamiliar field everything they need to start reading and searching with confidence. Think: what a generous colleague who knows the field would tell you over coffee.

## Agent Integrity Rules (Research-Pack Convention)

Inherited from the research-pack convention; locked verbatim per PR #657's cross-skill consistency audit.

- **Source discipline.** Only cite Consensus-returned papers from THIS session. Training knowledge labeled `[Not from Consensus — model knowledge]` and excluded from cited count. Sparse results stated explicitly, never silently filled.
- **Counting discipline.** Three numbers tracked: searches executed / unique papers received (deduplicated) / papers cited. Every cited paper has a retrievable Consensus URL from this session. Use `scripts/citation_tracker.py` for deterministic counts.
- **Tool constraints.** Consensus per-query cap depends on plan tier. **Detect at first search**, report at checkpoint. Rate limit is **1 query/sec** — sequential execution mandatory.
- **Retry policy.** On failure → wait 3s → retry once → log. After 3 consecutive failures: stop, alert user, share what was collected.
- **Plan-tier detection.** Parse first-search response for "Showing top 10" / "upgrade" → free tier (10/search). 20 returned → Pro (20/search). Calculate theoretical ceiling and surface at checkpoint so user can recalibrate.

See [`references/search_budget_allocation.md`](references/search_budget_allocation.md) for the sequential-execution rationale + plan-tier signals.

## Error Handling

| Failure | Behavior |
|---|---|
| Consensus rate-limit hit | Wait 3s, retry once, log outcome |
| Search returns 0 results | Note explicitly; "either niche terminology or genuine gap"; never silently fill |
| Plan-tier cap detected | Log tier; report at checkpoint; surface in audit |
| 3 consecutive failures | Stop searching, alert user, share what's collected, ask how to proceed |
| Sub-area returns thin results (<5 papers) | Flag in audit; suggest manual PubMed/Scholar supplementation |
| User wants to adjust sub-areas | Update table, re-confirm before searching |
| DOCX validation fails | Unpack XML, fix, repack |

## Phase 0: Grill-Me Intake (3 forcing questions, one at a time)

Each question carries explicit "why I'm asking". Stop condition: max 3 before Phase 1.

### Q1 (root) — Research question specificity

> **State the research question in 1–2 sentences. Specific is better — "How do LLMs perform on clinical reasoning tasks compared to physicians?" beats "AI in medicine". Vague questions produce vague reviews.**
>
> *Why I'm asking:* The reconnaissance search hinges on precise terminology. Vague questions produce thin recon results that don't yield a useful framework breakdown.

**Refuse mush.** Re-ask once with examples if user is too broad. If still vague, deliver with explicit "broad-scope orientation, not depth review" caveat.

### Q2 (depends on Q1) — Framework hint

> **Framework — pick one or say "you pick":**
>
> 1. **PICO** (Population / Intervention / Comparison / Outcome — most clinical questions)
> 2. **SPIDER** (Sample / Phenomenon / Design / Evaluation / Research-type — social/qualitative)
> 3. **Decomposition** (Problem / Solution / Evaluation / Limitations — technology-focused)
> 4. **Hybrid** (you pick which components from which framework)
> 5. **You pick** — analyze Q1 and recommend
>
> *Why I'm asking:* PICO is the default for ~70% of clinical questions but maps poorly to qualitative work or technology evaluation. Picking upfront saves the recon search from suggesting a misaligned framework.

Forcing choice with default ("you pick"). The skill surfaces its own framework recommendation after the recon search so user can override. Use `scripts/framework_recommender.py` for the heuristic.

See [`references/framework_selection.md`](references/framework_selection.md) for PICO / SPIDER / Decomposition canon.

### Q3 (depends on Q1) — Tentative depth

> **Tentative depth — pick one. Final confirmation comes after the framework breakdown:**
>
> 1. **Quick scan** (5 searches)
> 2. **Standard review** (10 searches)
> 3. **Deep dive** (20 searches)
>
> *Why I'm asking:* I ask this twice — once now to calibrate the recon search emphasis, once after the framework breakdown to confirm. Tentative answer affects which sub-areas to surface first; final answer drives search budget allocation.

Forcing choice. **Re-asked** at the post-Phase-2 checkpoint after the user has seen the framework breakdown.

**Stop condition:** 3 questions max before Phase 1. The post-Phase-2 checkpoint is its own grill-me moment (framework table + sub-area-adjustment + depth-reconfirmation).

## Phase 1: Initial Reconnaissance

**One broad Consensus search** to map themes, terminology, methodological distinctions.

- Query: broad version of Q1 (terminology variants are okay; first search casts wide)
- Record: `citation_tracker.py --action record_search --session NAME --query "..."`
- Record received count: `citation_tracker.py --action record_papers_received --session NAME --count N`
- **Detect plan tier** from response: "Showing top 10" / "upgrade" → free; 20 returned → Pro

Synthesize for the checkpoint:
- Themes that surfaced
- Terminology variations (e.g., "LLM" vs "large language model" vs "GPT-style model")
- Methodological distinctions (clinical trials vs benchmark eval vs case study)
- Coverage gaps (sub-questions absent from recon results)

## Phase 2: Framework Selection + Sub-area Generation

Choose framework (from Q2 OR override based on recon):
- **PICO** — most clinical questions (~70% default)
- **SPIDER** — social / qualitative
- **Decomposition** — technology focus (Problem / Solution / Evaluation / Limitations)
- **Hybrid** — explicit cross-framework mapping

Generate **4-5 sub-area questions** mapped to framework components. Each becomes a targeted Phase 3 search.

## Checkpoint (grill-me forcing-options moment)

After Phase 2, halt and present:

### 3-4 sentence recon summary
- What themes surfaced
- Terminology landscape
- Evidence landscape characterization

### Framework breakdown table

| Framework Component | How It Maps to This Topic | Proposed Sub-area to Explore |
|---|---|---|
| (Component 1) | ... | Sub-area 1 |
| (Component 2) | ... | Sub-area 2 |
| (Component 3) | ... | Sub-area 3 |
| (Component 4) | ... | Sub-area 4 |
| Cross-cutting theme | ... | Sub-area 5 |

### Depth re-confirmation (forcing choice)

Surface the **practical constraint**: detected plan tier + theoretical ceiling.

- Quick scan (5 searches × ~10 results each = ~50 papers max)
- Standard review (10 searches × ~10 = ~100 papers)
- Deep dive (20 searches × ~10 = ~200 papers)

### Sub-area forcing options

- "Looks good — proceed with these sub-areas"
- "Adjust: add sub-area on [X]"
- "Adjust: remove and replace [Y] with [Z]"
- "Restart with different framework"

### Why I'm asking (the rationale)

> A wrong framework or sub-area set wastes the search budget. This is the **last cheap moment** to correct course.

**Wait for user response before Phase 3.** Refuse to start Phase 3 without explicit user choice.

## Phase 3: Targeted Searches

Sequential (1 query/sec), budget per depth tier. See [`references/search_budget_allocation.md`](references/search_budget_allocation.md) for full canon.

### Quick scan (5 searches)
- 5 sub-area searches (one per sub-area)
- Skip era-gated + review-specific

### Standard review (10 searches)
- 5 sub-area searches
- 2 review article searches (top 2 sub-areas): `"systematic review [topic]"` / `"meta-analysis [topic]"`
- 2 era-gated searches (most important sub-area): `year_max: 2015` + `year_min: 2021`
- 1 follow-up on highest-cited paper using its key terms + `year_min` after publication

### Deep dive (20 searches)
- 5 sub-area searches
- 5 review article searches (one per sub-area)
- 4 era-gated searches (top 2 sub-areas, old + new each)
- 3 follow-ups on top 3 highest-cited papers
- 3 spare for emerging threads (surprising findings to chase)

Throughout: 1 q/sec rate limit. Sequential. Confirm response before next call. Record each via `citation_tracker.py`.

## Cross-Search Intelligence

Three trackers across ALL search results — run `scripts/cross_search_aggregator.py --session NAME` after Phase 3 completes:

1. **Repeat-hit papers** — same paper appearing in 3+ sub-area searches = likely foundational
2. **Recurring authors** — same author in multiple searches = dominant research group; top 3-5 most frequent matter
3. **Citation-per-year heuristic** — a 2023 paper with 150 citations >> 2008 paper with 150 citations. Use for seminal-work identification.

These feed the "Start Here" + "Key Research Groups" + "Bibliography" DOCX sections.

## Phase 4: DOCX Research Guide

Generate via Node.js + `docx` library. 8 sections (see [`references/docx_8_sections.md`](references/docx_8_sections.md) for full spec):

1. **Topic Overview** — single tight paragraph (4-6 sentences)
2. **Start Here — Priority Reading Order** — 5-7 papers ordered: best recent review → foundational → 2-3 frontier → gap/controversy. Each: hyperlinked title + authors/year + 1-sentence contribution + 1-sentence "what to look for"
3. **How the Field Got Here** — chronological narrative (1-2 paragraphs) + timeline table (5-8 milestones: Year / Milestone / Significance) + terminology evolution note
4. **Sub-area Guides** (one per sub-area, 4 parts each)
   - 4a. What the Research Shows (2-3 sentence synthesis with inline citations)
   - 4b. Key Papers (3-5 hyperlinked papers with citation count, year, 1-sentence importance)
   - 4c. Key Search Terms (6-10 keywords, synonyms, MeSH, historical terms)
   - 4d. Boolean Search Strings (2-3 ready-to-paste strings)
5. **Key Research Groups** — top 3-5 authors/groups with affiliations, sub-area coverage, representative paper link (from cross-search aggregator)
6. **Open Questions & Gaps** — three categories: methodological / population-context / conceptual-theoretical. Each gap explains *why it matters*.
7. **Bibliography** — alphabetical by first author. Every entry has clickable "View on Consensus" link. Every inline citation matches a bibliography entry.
8. **Audit Log** — search summary table (#, query, filters, papers returned, status), counts block, coverage notes including detected tier and theoretical ceiling

### DOCX Technical Requirements

Document the key `docx` library patterns:

- Page: US Letter, 1-inch margins
- Lists: `LevelFormat.BULLET` (never unicode bullets)
- Hyperlinks: `ExternalHyperlink` with `style: "Hyperlink"`, full URL (never truncated)
- Tables: dual widths (`columnWidths` + cell `width`), `ShadingType.CLEAR`
- Validation step after save (zip-integrity check: `python3 -c "import zipfile,sys; zipfile.ZipFile(sys.argv[1]).testzip()" output.docx` — no output = intact — then confirm the required sections are present)

Reference the **docx skill** for setup patterns and best practices.

## Output

```
research_guide_<topic-slug>_<YYYY-MM-DD>.docx
```

Plus:
- Chat summary block: "Saved: <path>. Audit: N searches × M unique papers / K cited. Plan tier: <tier>."
- Audit log printed inline if user asks for it

## Tooling

| Script | Role |
|---|---|
| `scripts/citation_tracker.py` | JSON-backed three-count audit at `~/.litreview_sessions/<session>.json` |
| `scripts/framework_recommender.py` | Heuristic PICO/SPIDER/Decomposition suggestion from research question |
| `scripts/cross_search_aggregator.py` | Repeat-hits + recurring-authors + citation-per-year ranking after Phase 3 |

## References

- [`references/framework_selection.md`](references/framework_selection.md) — PICO / SPIDER / Decomposition canon (7+ sources)
- [`references/search_budget_allocation.md`](references/search_budget_allocation.md) — depth tiers + cross-search intelligence + sequential execution rationale (7+ sources)
- [`references/docx_8_sections.md`](references/docx_8_sections.md) — research guide DOCX spec + technical requirements (7+ sources)

## Anti-Patterns To Reject

- Parallelizing Consensus calls
- Skipping the interactive checkpoint (running all searches without user confirmation)
- Padding thin results with training knowledge
- Defaulting to non-PICO framework without justification
- Citing papers in chat that didn't come from Consensus this session
- Hardcoding plan tier instead of detecting from first response
- Skipping era-gated searches in standard/deep budgets
- Skipping cross-search intelligence (repeat-hits, recurring authors)
- Truncating Consensus URLs in hyperlinks

---

**Version:** 1.0.0
**Source spec:** [`megaprompts/09-litreview-megaprompt.md`](../../../../megaprompts/09-litreview-megaprompt.md)
**Build pattern:** Path B (direct conversion). Sibling of `pulse` (research-pack shape).

More from alirezarezvani/claude-skills

SkillDescription
a11y-auditAccessibility audit skill for scanning, fixing, and verifying WCAG 2.2 Level A and AA compliance across React, Next.js, Vue, Angular, Svelte, and plain HTML codebases. Use when auditing accessibility, fixing a11y violations, checking color contrast, generating compliance reports, or integrating accessibility checks into CI/CD pipelines.
ab-test-setupWhen the user wants to plan, design, or implement an A/B test or experiment. Also use when the user mentions "A/B test," "split test," "experiment," "test this change," "variant copy," "multivariate test," "hypothesis," "conversion experiment," "statistical significance," or "test this." For tracking implementation, see analytics-tracking.
ad-creativeWhen the user needs to generate, iterate, or scale ad creative for paid advertising. Use when they say 'write ad copy,' 'generate headlines,' 'create ad variations,' 'bulk creative,' 'iterate on ads,' 'ad copy validation,' 'RSA headlines,' 'Meta ad copy,' 'LinkedIn ad,' or 'creative testing.' This is pure creative production — distinct from paid-ads (campaign strategy). Use ad-creative when you need the copy, not the campaign plan.
adversarial-reviewerAdversarial code review that breaks the self-review monoculture. Use when you want a genuinely critical review of recent changes, before merging a PR, or when you suspect Claude is being too agreeable about code quality. Forces perspective shifts through hostile reviewer personas that catch blind spots the author's mental model shares with the reviewer.
aeoAnswer Engine Optimization (AEO) skill — optimize content to be cited by AI language models (ChatGPT, Perplexity, Claude, Gemini, Mistral) as authoritative sources. Distinct from SEO — AEO optimizes for citation in LLM-generated responses, not search rankings. Use when planning content for AI-first search audiences, auditing existing content for E-E-A-T signals, tracking which pages get cited by which LLMs, or building a citation-friendly content strategy. Triggers — 'AEO audit', 'optimize for ChatGPT', 'get cited by Perplexity', 'LLM citation strategy', 'answer engine optimization', 'content for AI search', 'E-E-A-T audit'. Output is a markdown audit report (default) or JSON for pipeline integration. Stdlib-only Python tools.
agent-designerUse when the user asks to design a multi-agent system, pick an orchestration pattern (supervisor/swarm/pipeline), generate tool schemas for agents, or evaluate agent execution logs for cost, latency, and failure bottlenecks. Examples: 'design an agent architecture for research automation', 'generate Anthropic tool schemas from these tool descriptions', 'analyze these agent run logs for bottlenecks'. NOT for Claude Code workflow files (use workflow-builder) or single-agent prompt design (use agent-workflow-designer).
agent-protocolInter-agent communication protocol for C-suite agent teams. Defines invocation syntax, loop prevention, isolation rules, and response formats. Use when C-suite agents need to query each other, coordinate cross-functional analysis, or run board meetings with multiple agent roles.
agent-workflow-designerDesign production-grade multi-agent workflows with clear pattern choice (sequential, parallel, hierarchical), handoff contracts, failure handling, and cost/context controls. Use when architecting a multi-step agent pipeline, choosing between single-agent vs multi-agent approaches, or refactoring an LLM workflow that suffers from context bloat or unreliable handoffs.
agenthubMulti-agent collaboration plugin that spawns N parallel subagents competing on the same task via git worktree isolation. Agents work independently, results are evaluated by metric or LLM judge, and the best branch is merged. Use when: user wants multiple approaches tried in parallel — code optimization, content variation, research exploration, or any task that benefits from parallel competition. Requires: a git repo.
agile-product-ownerAgile product ownership for backlog management and sprint execution. Covers user story writing, acceptance criteria, sprint planning, and velocity tracking. Use when writing user stories, creating acceptance criteria, planning sprints, estimating story points, breaking down epics, or prioritizing the backlog.