ai-research
$
npx mdskill add arcasilesgroup/ai-engineering/ai-researchResearch external facts with tiered citations and async deep dives.
- Solves questions requiring outside knowledge beyond the codebase.
- Integrates local context, free MCPs, web search, and NotebookLM.
- Prioritizes fast tiers while running slow NotebookLM research in background.
- Delivers claims with source numbers and three recommended directions.
SKILL.md
.github/skills/ai-researchView on GitHub ↗
---
name: ai-research
description: "External evidence with citations via a 4-tier escalation (local → free MCPs → web/Exa → NotebookLM autonomous deep research, launched first / harvested last). Every claim sourced [N] or marked [unsourced]; output ends with 3 cited recommended directions. Trigger for 'what does the state of the art say about', 'compare options for', 'find sources on', 'investigate this question', 'research this'. Use for questions whose answer lives OUTSIDE the codebase. Not for codebase exploration; use /ai-explore instead. Not for refactors; use /ai-simplify instead. Not for business-logic debugging; use /ai-debug instead."
effort: mid
argument-hint: "[query] [--depth quick|standard|deep] [--reuse-notebook=id] [--persist]"
mode: agent
model_tier: sonnet
mirror_family: copilot-skills
generated_by: ai-eng sync
canonical_source: .claude/skills/ai-research/SKILL.md
edit_policy: generated-do-not-edit
---
# Research
## Purpose
Multi-tier, multi-source research skill with citation-first synthesis and persistent artifact reuse. Replaces ad-hoc `WebSearch` invocations with a disciplined escalation, but with NotebookLM's autonomous deep research running **async-first**: when NotebookLM is available it is launched FIRST (at T0, in a background subagent) so its slow autonomous deep-research job overlaps the fast tiers, then it is harvested LAST with a bounded wait. The fast tiers run meanwhile -- local context first (zero cost), then free MCPs (Context7, Microsoft Learn, `gh search`), then web search (Exa, with a built-in `WebSearch` fallback). Every external claim carries a `[N]` citation or is marked `[unsourced]` so readers can audit grounding, and the output always ends with exactly 3 recommended strategic directions.
Every external tier (NotebookLM, Context7, Exa, MS Learn) is capability-detected and fail-soft: an absent or unauthenticated tool is skipped silently (recorded as degraded) and never errors the run.
Outputs are designed for reuse: research is persisted to `.ai-engineering/runtime/research/<topic-slug>-<YYYY-MM-DD>.md` -- including the NotebookLM deep-research report and `notebook_id` -- so subsequent sessions short-circuit at Tier 0 or harvest a still-running notebook via `--reuse-notebook`.
## When to Use
- User asks for evidence: "what does the industry do for X", "state of the art on Y", "compare A vs B", "find sources on Z".
- `/ai-brainstorm` interrogation flags a question requiring external evidence (handler `interrogate.md` invokes this skill).
- User wants a verifiable, cited answer rather than the model's training-data recall.
- Research worth archiving for the team (deep technical investigations, library comparisons, architecture decisions).
### Off-ramp -- when to use `/ai-explore` instead
`/ai-research` answers questions whose source-of-truth lives **outside** this repository (web, docs portals, third-party APIs, academic papers). For questions whose answer lives **inside** this repo's files (architecture, dependency graph, pattern usage), dispatch `/ai-explore` instead -- it's read-only, codebase-only, and produces a structured architecture map rather than a cited narrative.
Do NOT use for: refactoring, writing scripts from scratch, debugging business logic, code review, or general programming concepts.
## Process
The order below is the **execution timeline**, not a strict gate sequence: Tier 3 is *launched* first and *harvested* last, so its autonomous deep-research job runs concurrently with the fast Tiers 0-2.
1. **Classify query** -- follow `handlers/classify-query.md` to decide which Tier 1 MCPs apply (library mention → Context7; Azure/Microsoft → MS Learn; real-world-code question → `gh search`; explicit URL → mark for fetch). NotebookLM (Tier 3) is NOT gated here -- it is default-on whenever available (see step 2).
2. **Tier 3 launch (T0, background, default-on)** -- follow the Launch half of `handlers/tier3-notebooklm.md`. Probe `mcp__notebooklm__nlm_list` for capability/auth. When NotebookLM is available, launch its autonomous deep-research job FIRST in a background subagent (`nlm_create_notebook` then `nlm_research(mode=deep)`) so it overlaps the fast tiers. This runs **by default with no flag**; when NotebookLM is absent/unauthenticated the launch short-circuits to degraded (recorded, never raised) and the run proceeds on Tiers 0-2.
3. **Tier 0 -- local** -- follow `handlers/tier0-local.md`. Search prior research artifacts, `LESSONS.md`, and `framework-events.ndjson` for prior `/ai-research` invocations. If ≥3 relevant hits, the agent MAY short-circuit the fast tiers and synthesize from local context (the Tier 3 job is still harvested in step 6).
4. **Tier 1 -- free MCPs (parallel)** -- follow `handlers/tier1-free-mcps.md`. Invoke Context7, Microsoft Learn, and `gh search code/repos` IN PARALLEL when the classifier marks them applicable AND each is available (per-source capability detection). Dedup by URL/path; absent sources are recorded in `degraded_sources`.
5. **Tier 2 -- web (Exa, built-in fallback)** -- follow `handlers/tier2-web.md`. Invoke web search + fetch in parallel when Tier 1 produced fewer than 5 high-quality hits, or the query references an explicit URL. Search uses Exa (`mcp__exa__web_search_exa` / `mcp__exa__web_fetch_exa`) when available, falling back to the built-in `WebSearch` / `WebFetch` (recording `"exa"` in `degraded_sources`) when not. Honor `--allowed-domains` and `--blocked-domains`.
6. **Tier 3 harvest (bounded wait)** -- follow the Harvest half of `handlers/tier3-notebooklm.md`. After Tiers 0-2 finish, wait on the deep-research job up to a bounded budget (`AIENG_RESEARCH_NLM_WAIT_SEC`, default 300s, ceiling 900s). On completion, collect the deep report (`report_markdown`) and the sources NotebookLM discovered autonomously. On timeout, degrade gracefully and preserve `notebook_id` so a later `--reuse-notebook=<id>` harvests the finished report.
7. **Synthesize with citations + 3 directions** -- follow `handlers/synthesize-with-citations.md`. Merge (dedup) the NotebookLM-discovered sources with the Tier 0-2 sources, then produce output where every external claim carries `[N]` or `[unsourced]` and which ends with EXACTLY 3 recommended directions (each individually cited). Validator regex `\[\d+\]|\[unsourced\]` must match at least once per claim paragraph and per direction; on failure, retry with a stricter system message (max 2 retries).
8. **Persist artifact** -- follow `handlers/persist-artifact.md`. Write `.ai-engineering/runtime/research/<topic-slug>-<YYYY-MM-DD>.md` with frontmatter (`query`, `depth`, `tiers_invoked`, `sources_used`, `notebook_id`, `created_at`, `slug`), the Question/Findings/Sources/Notebook Reference sections, and -- when the harvest completed -- an optional `## Deep Research Report`. Auto-persist when Tier 3 was invoked; opt-in via `--persist` otherwise.
## CLI Flags
- **NotebookLM deep research is default-on when available** -- there is NO flag to enable it. It launches whenever the `nlm_list` capability/auth probe succeeds (spec `notebooklm-async-tier3` D3); the source-count / comparative heuristics that used to gate it are gone (the count is unknowable at T0, when the background launch happens).
- `--depth quick|standard|deep` (default: `standard`). Controls **Tier 0-2** escalation only -- `quick` runs Tier 0+1; `standard` adds Tier 2; `deep` widens the fast tiers. It does NOT gate Tier 3 (which is independently default-on when available).
- `--reuse-notebook=<id>` (opt-in). Re-attach to an existing NotebookLM notebook instead of calling `nlm_create_notebook`, and harvest its deep-research report. The primary use is following up a prior run whose harvest timed out: the earlier run persisted `notebook_id` in its artifact, and `--reuse-notebook=<id>` retrieves the now-finished report (AC6).
- `--persist` (opt-in). Forces artifact persistence even when Tier 3 was not invoked (Tier-3 runs auto-persist).
- `--allowed-domains a.com,b.com` (pass-through to the web search call -- Exa or built-in).
- `--blocked-domains x.com,y.com` (pass-through to the web search call -- Exa or built-in).
## Output Contract
Synthesized response in agent context PLUS, when persisted, a Markdown artifact at `.ai-engineering/runtime/research/<topic-slug>-<YYYY-MM-DD>.md`. The response ALWAYS ends with a `## Recommended Directions` block of EXACTLY 3 strategic directions ("rumbo"), each individually cited with `[N]` or `[unsourced]` (spec `notebooklm-async-tier3` D8/G7, AC5). Output format:
```
## Question
<verbatim user query>
## Findings
<paragraphs with inline [N] citations or [unsourced] markers>
## Recommended Directions
1. **<title>** — <1-2 line rationale> [N]. Trade-off: <cost/risk>.
2. **<title>** — <rationale> [N]. Trade-off: <cost/risk>.
3. **<title>** — <rationale> [unsourced]. Trade-off: <cost/risk>.
## Sources
1. (title, url, accessed_at)
2. ...
## Notebook Reference
<NotebookLM notebook_id if Tier 3 was launched; "_(none)_" otherwise>
## Deep Research Report # OPTIONAL — present only when the NotebookLM
<verbatim deep-research report> # bounded harvest completed within the wait window
```
`## Recommended Directions`, `## Question`, `## Findings`, `## Sources`, and `## Notebook Reference` are always present. `## Deep Research Report` is appended only when the NotebookLM harvest completed (non-empty `report_markdown`); on harvest timeout it is absent but `notebook_id` is still recorded so a `--reuse-notebook=<id>` follow-up can retrieve it.
## Common Mistakes
- Skipping Tier 0 and going straight to web search (defeats the reuse goal).
- Producing claims without `[N]` or `[unsourced]` markers (defeats the citation hard-rule).
- Omitting the `## Recommended Directions` block, or emitting other than EXACTLY 3 directions, or leaving a direction uncited (violates the output contract / AC5).
- Treating an absent tool as a hard failure. Capability detection is fail-soft: a missing or unauthenticated NotebookLM / Context7 / Exa / MS Learn is skipped silently, recorded in `degraded_sources`, and the run still returns output -- it MUST never error (D7).
- Running the NotebookLM deep research synchronously instead of launching it first in a background subagent (defeats the async overlap -- the slow Tier 3 job should run while Tiers 0-2 execute, then be harvested with a bounded wait).
- Discarding `notebook_id` on harvest timeout (loses the `--reuse-notebook` follow-up path; the report finishes later but cannot be retrieved).
- Not deduplicating Tier 1 results, or failing to merge/dedup NotebookLM-discovered sources with the Tier 0-2 sources at synthesis (Context7 / `gh search` / NotebookLM can return overlapping URLs).
## Examples
### Example 1 — quick state-of-the-art lookup
User: "what's the current best practice for OAuth refresh-token rotation?"
```
/ai-research "OAuth refresh-token rotation best practices" --depth quick
```
If NotebookLM is available, an autonomous deep-research job launches first in the background; meanwhile Tier 0 checks local artifacts and Tier 1 queries Context7 / Microsoft Learn. The deep job is harvested with a bounded wait, sources are merged, and the answer ends with a Findings block (inline `[N]`) plus 3 cited recommended directions. If NotebookLM is absent, the run completes on Tiers 0-1 and notes the degraded source.
### Example 2 — deep dive (NotebookLM default-on) persisted for reuse
User: "deep research on event-sourcing vs CQRS for fintech ledgers, save it for next time"
```
/ai-research "event sourcing vs CQRS for fintech ledgers" --depth deep --persist
```
NotebookLM deep research launches at T0 (no flag needed -- default-on when available) and overlaps Tiers 0-2 (Exa for web). The harvest's deep report and discovered sources fuse with the tier sources; the artifact at `.ai-engineering/runtime/research/<slug>-<date>.md` carries the `## Deep Research Report` and `notebook_id` so future invocations short-circuit at Tier 0.
### Example 3 — harvest a notebook that finished after a prior timeout (AC6)
User: a prior run timed out waiting on the deep job; its artifact recorded `notebook_id: nlm-abc123`.
```
/ai-research "event sourcing vs CQRS for fintech ledgers" --reuse-notebook=nlm-abc123
```
Re-attaches to the existing notebook (skips `nlm_create_notebook`), harvests the now-finished deep-research report, and re-synthesizes with the report included.
## Integration
Called by: user directly, `/ai-brainstorm` (interrogate handler). Calls: tier0–tier3 handlers (Tier 3 launched first / harvested last), `synthesize-with-citations.md`, `persist-artifact.md`. Produces: `.ai-engineering/runtime/research/<slug>-<date>.md`. See also: `/ai-brainstorm` (consumes research as evidence).
$ARGUMENTS
More from arcasilesgroup/ai-engineering
- ai-adviseProactive governance advisor — checks standards, decisions, and quality trends during development. Always advisory, NEVER blocks. Three modes: `advise` (post-edit), `gate` (pre-dispatch), `drift` (on-demand decision audit). Trigger for 'governance check', 'advise on this change', 'check for drift', 'is this aligned with active decisions', 'shift-left advisory'. Not for blocking gates — use /ai-verify. Not for narrative code review — use /ai-review.
- ai-analyze-permissionsUse when Claude Code keeps asking to approve commands you have already approved, when settings.local.json has grown large, or when you want to consolidate permission grants into wildcard patterns. Trigger for 'too many permission prompts', 'clean up permissions', 'audit my settings', 'consolidate allow rules'. Claude Code only — not available in GitHub Copilot, Antigravity, or Codex.
- ai-animationDesigns motion, transitions, and micro-interactions for UI components: spring animations, gestures, easing, staggers — taste-driven detail compounding. Trigger for 'animate this', 'add transitions', 'micro-interactions for', 'gesture design', 'swipe to dismiss', 'easing for this', 'stagger the'. Not for design systems; use /ai-design instead. Not for visual art; use /ai-visual instead. Not for testing animation code; use /ai-test instead.
- ai-autopilotDelivers large multi-concern specs and backlog runs autonomously: decomposes specs into sub-specs (or normalizes work items into a backlog DAG), deep-plans with parallel agents, builds a dependency DAG, implements in waves, runs a single final quality loop with one bounded quality-remediation pass (verify+guard+review on full changeset), delivers via PR. Trigger for 'implement spec-NNN end to end', 'autopilot this', 'autonomous delivery', 'decompose and ship', 'run the backlog', 'execute these GitHub issues', 'process the sprint backlog'. Invocation is the approval gate. Not for small or single-concern tasks; use /ai-build instead. Not for ambiguous requirements; use /ai-brainstorm first.
- ai-boardOperates the project board (GitHub Projects v2 or Azure DevOps): discovers configuration after install (fields, state mappings, process templates) and syncs work-item state at lifecycle transitions. Trigger for 'set up the board', 'configure our ADO board', 'discover board fields', 'move this issue to in-review', 'update the board', 'mark as in progress', 'sync the work item state'. Two subcommands: `discover` (post-install configuration write) and `sync` (lifecycle state transitions). Auto-invoked via `sync` by /ai-brainstorm, /ai-build, and /ai-pr; fail-open. Not for backlog execution; use /ai-autopilot --backlog instead.
- ai-brainstormForces rigorous design interrogation BEFORE any code: explores approaches, surfaces ambiguity, gathers evidence, produces an approved spec that becomes the contract for /ai-plan. Trigger for 'lets add X', 'how should we handle Y', 'whats the best approach', 'I am thinking about', 'what should we build for'. Not for existing approved specs; use /ai-plan instead. Not for execution; use /ai-build instead.
- ai-branch-cleanupCleans branches safely: switches to the default branch, prunes merged and squash-merged branches, syncs to remote, sweeps stale specs, rotates `.ai-engineering/runtime/` per retention policy. Trigger for 'tidy up', 'tidy branches', 'sync to main', 'delete old branches', 'start fresh', 'rotate runtime'. Auto-invoked by /ai-pr after merge. Not for committing changes; use /ai-commit instead. Not for code-level dead-code removal; use /ai-simplify instead.
- ai-buildCanonical implementation gateway: reads approved plan.md, resolves stack from manifest, deterministic-routes each task to its adapter, dispatches the build agent in an isolated worktree, runs TDD self-validation per task, then a single final quality loop with one bounded quality-remediation pass on the full changeset before /ai-pr. Trigger for 'go', 'start building', 'execute the plan', 'implement it', 'lets do this', 'build the plan', 'resume', 'continue'. Not without an approved plan; run /ai-plan first. Not for multi-concern specs needing decomposition; use /ai-autopilot instead. Not for a single function or subcomponent; use /ai-code.
- ai-codeWrites production code that satisfies stack-context standards on the first pass: interface-first design, backward-compatibility checks, lightweight self-review. Trigger for 'implement this', 'write the code for', 'add X to Y', 'build this function', 'make this work'. Not for tests; use /ai-test instead. Not for debugging; use /ai-debug instead. Not for refactoring; use /ai-simplify instead. Not for executing an approved plan end-to-end; use /ai-build (the gateway).
- ai-commitRuns the governed commit pipeline: auto-branches from protected, stages selectively, formats and lints, scans for secrets, gates docs, composes a conventional message, pushes. Trigger for 'commit my changes', 'save my work', 'push this to remote', 'stage these files', 'ship it'. Not for opening a PR; use /ai-pr instead. Not for branch hygiene; use /ai-branch-cleanup instead.