semantic-scholar
$
npx mdskill add wanshuiyin/Auto-claude-code-research-in-sleep/semantic-scholarSearch topic or paper ID: $ARGUMENTS
SKILL.md
.github/skills/semantic-scholarView on GitHub ↗
---
name: semantic-scholar
description: Search published venue papers (IEEE, ACM, Springer, etc.) via Semantic Scholar API. Complements /arxiv (preprints) with citation counts, venue metadata, and TLDR. Use when user says "search semantic scholar", "find IEEE papers", "find journal papers", "venue papers", "citation search", or wants published literature beyond arXiv preprints.
argument-hint: query-or-paper-id
allowed-tools: Bash(*), Read, Write
---
# Semantic Scholar Paper Search
Search topic or paper ID: $ARGUMENTS
## Role & Positioning
This skill is the **published venue** counterpart to `/arxiv`:
| Skill | Source | Best for |
|-------|--------|----------|
| `/arxiv` | arXiv API | Latest preprints, cutting-edge unrefereed work |
| `/semantic-scholar` | Semantic Scholar API | **Published** journal/conference papers (IEEE, ACM, Springer, etc.) with citation counts, venue info, TLDR |
**Do NOT duplicate arXiv's job.** If results contain an `externalIds.ArXiv` field, the paper is also on arXiv — note this but do not re-fetch from arXiv.
## Constants
- **MAX_RESULTS = 10** — Default number of search results.
- **S2_FETCHER** — canonical name `semantic_scholar_fetch.py`, resolved per
[`shared-references/integration-contract.md`](../shared-references/integration-contract.md) §2
(Policy D1 — primary + fallback cascade). If unresolved (canonical
chain exhausted), fall back to the inline Python alternative
documented in Step 2.
- **DEFAULT_FILTERS** — For general research queries, apply these by default to reduce noise:
- `--fields-of-study "Computer Science,Engineering"`
- `--publication-types JournalArticle,Conference`
> Overrides (append to arguments):
> - `/semantic-scholar "topic" - max: 20` — return up to 20 results
> - `/semantic-scholar "topic" - type: journal` — only journal articles
> - `/semantic-scholar "topic" - type: conference` — only conference papers
> - `/semantic-scholar "topic" - min-citations: 50` — only highly-cited papers
> - `/semantic-scholar "topic" - year: 2022-` — papers from 2022 onward
> - `/semantic-scholar "topic" - fields: all` — remove default field-of-study filter
> - `/semantic-scholar "topic" - sort: citations` — bulk search sorted by citation count
> - `/semantic-scholar "DOI:10.1109/..."` — fetch a single paper by DOI
## Workflow
### Step 1: Parse Arguments
Parse `$ARGUMENTS` for directives:
- **Query or ID**: main search term, or a paper identifier:
- DOI: `10.1109/TWC.2024.1234567`
- Semantic Scholar ID: `f9314fd99be5f2b1b3efcfab87197d578160d553`
- ArXiv: `ARXIV:2006.10685`
- Corpus: `CorpusId:219792180`
- **`- max: N`**: override MAX_RESULTS
- **`- type: journal|conference|review|all`**: map to `--publication-types`
- **`- min-citations: N`**: map to `--min-citations`
- **`- year: RANGE`**: map to `--year` (e.g. `2022-`, `2020-2024`)
- **`- fields: FIELDS`**: override `--fields-of-study` (use `all` to remove filter)
- **`- sort: citations|date`**: use `search-bulk` with `--sort citationCount:desc` or `publicationDate:desc`
If the argument matches a DOI pattern (`10.XXXX/...`), a Semantic Scholar ID (40-char hex), or a prefixed ID (`ARXIV:...`, `CorpusId:...`), skip search and go directly to Step 3.
### Step 2: Search Papers
Resolve `$S2_FETCHER` via the canonical strict-safe chain (see
[`shared-references/integration-contract.md`](../shared-references/integration-contract.md) §2):
```bash
cd "$(git rev-parse --show-toplevel 2>/dev/null || pwd)" || exit 1
if [ -z "${ARIS_REPO:-}" ] && [ -f .aris/installed-skills.txt ]; then
ARIS_REPO=$(awk -F'\t' '$1=="repo_root"{print $2; exit}' .aris/installed-skills.txt 2>/dev/null) || true
fi
S2_FETCHER=".aris/tools/semantic_scholar_fetch.py"
[ -f "$S2_FETCHER" ] || S2_FETCHER="tools/semantic_scholar_fetch.py"
[ -f "$S2_FETCHER" ] || { [ -n "${ARIS_REPO:-}" ] && S2_FETCHER="$ARIS_REPO/tools/semantic_scholar_fetch.py"; }
[ -f "$S2_FETCHER" ] || S2_FETCHER=""
```
**Standard search** (default — relevance-ranked):
```bash
python3 "$S2_FETCHER" search "QUERY" --max MAX_RESULTS \
--fields-of-study "Computer Science,Engineering" \
--publication-types JournalArticle,Conference
```
**Bulk search** (when `- sort:` is specified, or MAX_RESULTS > 100):
```bash
python3 "$S2_FETCHER" search-bulk "QUERY" --max MAX_RESULTS \
--sort citationCount:desc \
--fields-of-study "Computer Science" \
--year "2020-"
```
If `$S2_FETCHER` is empty (Policy D1 cascade), fall back to inline Python using `urllib` against `https://api.semanticscholar.org/graph/v1/paper/search`.
**Recommended filter combos** (from testing):
| Goal | Flags |
|------|-------|
| High-quality journal papers | `--publication-types JournalArticle --min-citations 10` |
| CS/EE papers, recent | `--fields-of-study "Computer Science,Engineering" --year "2022-"` |
| Foundational / high-impact | `search-bulk --sort citationCount:desc --fields-of-study "Computer Science"` |
| Conference papers only | `--publication-types Conference` |
> **Note**: `--venue` requires exact venue names (e.g. "IEEE Transactions on Signal Processing"), not partial matches like "IEEE". Avoid using `--venue` in automated flows — prefer `--publication-types` + `--fields-of-study`.
### Step 3: Fetch Details for a Specific Paper
When a single paper ID is requested:
```bash
python3 "$S2_FETCHER" paper "PAPER_ID"
```
Where PAPER_ID can be:
- DOI: `10.1109/TSP.2021.3071210`
- ArXiv: `ARXIV:2006.10685`
- CorpusId: `CorpusId:219792180`
- S2 ID: `f9314fd99be5f2b1b3efcfab87197d578160d553`
### Step 4: De-duplicate Against arXiv
For each result, check `externalIds.ArXiv`:
- If present → paper is also on arXiv. Note this in output but do NOT re-fetch via `/arxiv`.
- If absent → paper is **venue-only** (e.g. IEEE without preprint). This is the unique value of this skill.
### Step 5: Present Results
Present results as a table:
```text
| # | Title | Venue | Year | Citations | Authors | Type |
|---|-------|-------|------|-----------|---------|------|
| 1 | Deep Learning Enabled... | IEEE Trans. Signal Process. | 2021 | 1364 | Xie et al. | Journal |
```
For each paper, also show:
- **DOI link**: `https://doi.org/DOI` (for IEEE/ACM papers, this is the canonical link)
- **Open Access PDF**: if `openAccessPdf.url` is non-empty, show it
- **TLDR**: if available, show the one-line summary
- **Also on arXiv**: if `externalIds.ArXiv` exists, note the arXiv ID
### Step 6: Detailed Summary
For each paper (or top 5 if many results):
```markdown
## [Title]
- **Venue**: [venue name] ([publicationVenue.type]: journal/conference)
- **Year**: [year] | **Citations**: [citationCount]
- **Authors**: [full author list]
- **DOI**: [doi link]
- **Fields**: [fieldsOfStudy]
- **TLDR**: [tldr.text if available]
- **Abstract**: [abstract]
- **Open Access**: [openAccessPdf.url or "Not available"]
- **Also on arXiv**: [ArXiv ID if exists, else "No"]
```
### Step 7: Update Research Wiki (if active)
**Required when `research-wiki/` exists in the project**; skip silently
otherwise. When the wiki dir exists, resolve `$WIKI_SCRIPT` per the
canonical chain at
[`shared-references/wiki-helper-resolution.md`](../shared-references/wiki-helper-resolution.md)
(Variant B — warn-and-skip). For results with an `externalIds.ArXiv`
field, use `--arxiv-id`; for venue-only papers (no arXiv mirror —
common for IEEE/ACM), fall back to manual metadata:
```bash
if [ -d research-wiki/ ]; then
cd "$(git rev-parse --show-toplevel 2>/dev/null || pwd)" || exit 1
ARIS_REPO="${ARIS_REPO:-$(awk -F'\t' '$1=="repo_root"{print $2; exit}' .aris/installed-skills.txt 2>/dev/null)}"
WIKI_SCRIPT=".aris/tools/research_wiki.py"
[ -f "$WIKI_SCRIPT" ] || WIKI_SCRIPT="tools/research_wiki.py"
[ -f "$WIKI_SCRIPT" ] || { [ -n "${ARIS_REPO:-}" ] && WIKI_SCRIPT="$ARIS_REPO/tools/research_wiki.py"; }
[ -f "$WIKI_SCRIPT" ] || {
echo "WARN: research_wiki.py not found; semantic-scholar results delivered, wiki ingest skipped. Fix: bash tools/install_aris.sh, export ARIS_REPO, or cp <ARIS-repo>/tools/research_wiki.py tools/." >&2
WIKI_SCRIPT=""
}
[ -n "$WIKI_SCRIPT" ] && for each paper in results:
if paper.externalIds.ArXiv:
python3 "$WIKI_SCRIPT" ingest_paper research-wiki/ \
--arxiv-id "<ArXiv>"
else:
python3 "$WIKI_SCRIPT" ingest_paper research-wiki/ \
--title "<title>" --authors "<authors joined by , >" \
--year <year> --venue "<venue>" \
[--external-id-doi "<externalIds.DOI>"]
fi
```
The helper handles slug / dedup / page / index / log — **do not
handwrite `papers/<slug>.md`**. See
[`shared-references/integration-contract.md`](../shared-references/integration-contract.md).
Backfill with `/research-wiki sync --arxiv-ids <id1>,<id2>,...` for
arXiv-available papers.
### Step 8: Final Output
Summarize what was done:
- `Found N published papers for "query"`
- `Filters applied: [publication types, fields, year range, etc.]`
- `N papers are venue-only (not on arXiv)`
- `Wiki-ingested N papers` (if `research-wiki/` was present)
Suggest follow-up skills:
```text
/arxiv "topic" - search arXiv preprints (complements this search)
/research-lit "topic" - multi-source review: Zotero + local PDFs + arXiv + S2
/novelty-check "idea" - verify novelty against literature
```
## Key Rules
- **Default to filtered search**: Always apply `--fields-of-study` and `--publication-types` unless user says `- fields: all`. Without filters, S2 returns cross-discipline noise (linguistics, psychology, etc.).
- **Citation count is gold**: S2's citation data is its main advantage over arXiv. Always show `citationCount` prominently and use it to rank/prioritize results.
- **Venue metadata matters**: Show `venue` and `publicationVenue.type` (journal vs conference) — this helps users assess paper quality.
- **DOI is the canonical ID for published papers**: Always show DOI links for IEEE/ACM/Springer papers.
- **Rate limiting**: S2 API without key is heavily rate-limited (~1 req/s, strict cooldown). If HTTP 429 occurs, wait and retry. Recommend users set `SEMANTIC_SCHOLAR_API_KEY` env var for higher limits (free at https://www.semanticscholar.org/product/api#api-key-form).
- **TLDR may be null**: Some publishers (notably IEEE) elide the TLDR field. Fall back to showing the first sentence of the abstract.
- **openAccessPdf may be empty**: Many IEEE papers are closed access. Always provide the DOI link as fallback.
- If the S2 API is unreachable, suggest using `/arxiv` or `/research-lit "topic" - sources: web` as fallback.
More from wanshuiyin/Auto-claude-code-research-in-sleep
- ablation-plannerUse when main results pass result-to-claim (claim_supported=yes or partial) and ablation studies are needed for paper submission. Codex designs ablations from a reviewer's perspective, CC reviews feasibility and implements.
- alphaxivQuick single-paper lookup via AlphaXiv LLM-optimized summaries with tiered source fallback. Use when user says "explain this paper", "summarize paper", pastes an arXiv/AlphaXiv URL, or provides a bare arXiv ID for quick understanding - not for broad literature search.
- analyze-resultsAnalyze ML experiment results, compute statistics, generate comparison tables and insights. Use when user says "analyze results", "compare", or needs to interpret experimental data.
- auto-paper-improvement-loopAutonomously improve a generated paper via GPT-5.4 xhigh review → implement fixes → recompile, for 2 rounds. Use when user says \"改论文\", \"improve paper\", \"论文润色循环\", \"auto improve\", or wants to iteratively polish a generated paper.
- auto-review-loopAutonomous multi-round research review loop. Repeatedly reviews via external reviewer backend (Codex or manual), implements fixes, and re-reviews until positive assessment or max rounds reached. Use when user says "auto review loop", "review until it passes", or wants autonomous iterative improvement.
- auto-review-loop-llmAutonomous research review loop using any OpenAI-compatible LLM API. Configure via llm-chat MCP server or environment variables. Trigger with "auto review loop llm" or "llm review".
- auto-review-loop-minimaxAutonomous multi-round research review loop using MiniMax API. Use when you want to use MiniMax instead of Codex MCP for external review. Trigger with "auto review loop minimax" or "minimax review".
- citation-auditZero-context verification that every bibliographic entry in the paper is real, correctly attributed, and used in a context the cited paper actually supports. Uses a fresh cross-model reviewer with web/DBLP/arXiv lookup to catch hallucinated authors, wrong years, fabricated venues, version mismatches, and wrong-context citations (cite present but the cited paper does not establish the claim). Use when user says \"审查引用\", \"check citations\", \"citation audit\", \"verify references\", \"引用核对\", or before submission to ensure bibliography integrity.
- claims-draftingDraft patent claims for an invention. Use when user says \"撰写权利要求\", \"draft claims\", \"写权利要求书\", \"claim drafting\", or wants to create patent claims. The core skill of the patent pipeline.
- comm-lit-review-claude-singleCommunications-domain literature review with Claude-style knowledge-base-first retrieval. Use when the task is about communications, wireless, networking, satellite/NTN, Wi-Fi, cellular, transport protocols, congestion control, routing, scheduling, MAC/PHY, rate adaptation, channel estimation, beamforming, or communication-system research and the user wants papers, related work, a survey, or a landscape summary. Search Zotero, Obsidian, and local paper folders first when available, then search IEEE Xplore, ScienceDirect, ACM Digital Library, and broader web in that order.