reference-finder
$
npx mdskill add aipoch/medical-research-skills/reference-finderMatch scientific claims to PubMed papers instantly.
- Grounds every sentence with top-ranked citation evidence.
- Connects to the official PubMed E-utilities API exclusively.
- Ranks matches by keyword overlap, year, and citation count.
- Delivers structured titles, DOIs, PMIDs, and reasoning per sentence.
SKILL.md
.github/skills/reference-finderView on GitHub ↗
---
name: reference-finder
description: Automatically finds and ranks PubMed references for each sentence in scientific text; use when you need titles, DOIs, and brief recommendation reasons from the PubMed E-utilities API.
license: MIT
author: aipoch
---
> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)
## When to Use
- You have a scientific paragraph and want suggested PubMed papers for **each sentence**.
- You need **top-ranked references** with **title, DOI, PMID, year**, and a short **why recommended** explanation.
- You are drafting or reviewing a manuscript and want quick **literature grounding** for key claims.
- You want a lightweight reference matcher that uses **only the official PubMed E-utilities API** (no third-party services).
- You need a scriptable tool for batch or CLI workflows to generate candidate citations.
## Key Features
- Sentence-level reference matching for scientific text.
- Returns the **top N (default: 3)** most relevant PubMed records per sentence.
- Outputs structured fields: **title, DOI, PMID, year, recommendation reason**.
- Relevance ranking based on:
- keyword overlap / match strength,
- publication year preference,
- citation-count signal (when available/derivable).
- Safety constraints:
- Network access restricted to `eutils.ncbi.nlm.nih.gov`.
- No local filesystem writes except to `outputs/` during execution.
- Request timeout set to **30 seconds** with clear error messages.
- Supports Python API usage and CLI usage (including interactive mode).
## Dependencies
- Python **3.x** (standard library only; no third-party packages required)
## Example Usage
### Python (direct call)
```python
from reference_finder import find_references
text = "CRISPR-Cas9 gene editing has revolutionized biomedical research."
results = find_references(text)
for ref in results[:3]:
print(f"- {ref['title']} ({ref['year']})")
print(f" DOI: {ref['doi']}")
print(f" PMID: {ref['pmid']}")
print(f" Reason: {ref['reason']}")
```
### CLI (single input)
```bash
python scripts/find_refs.py "CRISPR-Cas9 gene editing has revolutionized biomedical research."
```
### CLI (interactive mode)
```bash
python scripts/find_refs.py
```
### Example output (JSON)
```json
[
{
"pmid": "PMID:",
"title": "A Programmable Dual-RNA-Guided DNA Endonuclease in Vitro",
"doi": "10.1126/science.1225829",
"year": 2012,
"reason": "Highest keyword match for 'CRISPR-Cas9', foundational paper"
}
]
```
## Implementation Details
### Data flow
1. **Sentence splitting**: The input text is split into sentences (implementation-defined; typically punctuation-based).
2. **PubMed search (ESearch)**: For each sentence, a query is sent to:
- `https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi`
3. **Record retrieval (EFetch)**: The top candidate PMIDs are fetched via:
- `https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi`
4. **Field extraction**: Title, year, PMID, and DOI (when present) are extracted from the returned metadata.
5. **Ranking and selection**: Candidates are scored and the top **N** are returned with a short recommendation reason.
### Ranking signals
- **Keyword match**: Measures overlap between sentence terms and retrieved record metadata (e.g., title/abstract terms when available).
- **Publication year**: Used as a preference signal (e.g., favoring more recent work unless a classic/foundational match is strong).
- **Citation count**: Incorporated when available/derivable; otherwise treated as missing without failing the run.
### Operational constraints and safety
- **Allowed network host**: `eutils.ncbi.nlm.nih.gov` only.
- **Prohibited**: Any third-party URLs.
- **Filesystem**: Do not write outside `outputs/` during execution.
- **Rate limiting**: Use a reasonable request cadence (e.g., **~0.5s** between requests) to respect API limits.
- **Timeout**: **30 seconds** per request.
- **Error handling**: Return semantic, user-readable error messages for network/API/parse failures.
### Defaults
- **Top references per sentence**: 3
- **Endpoints**:
- ESearch: `https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi`
- EFetch: `https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi`
### Related project files
- Main script: `scripts/find_refs.py`
- Tests: `tests/test_finder.py`
- Evaluation checklist: `references/evaluation-checklist.md`
- PubMed E-utilities documentation: https://www.ncbi.nlm.nih.gov/books/NBK25504/More from aipoch/medical-research-skills
- 3d-molecule-ray-tracerGenerate photorealistic rendering scripts for PyMOL and UCSF ChimeraX.
- abstract-summarizerTransform lengthy academic papers into concise, structured 250-word abstracts.
- abstract-trimmerPrecision editing tool that reduces abstract word count through intelligent compression techniques, maintaining scientific rigor while meeting strict journal and conference requirements.
- academic-abstract-refinerRefines long medical academic texts into SCI-style unstructured Chinese and English abstracts; use when you need to condense drafts/reports/summaries into bilingual abstracts and generate Summary_Report.md.
- academic-cv-generatorGenerate structured academic CVs from free-form Chinese/English text and export to Word (.docx). Use this skill when you are asked to organize, generate, or optimize an academic CV (e.g., publications/projects/awards) into a consistent, formatted document with uniform-colored section headers and optional bilingual output.
- academic-highlight-generatorGenerates submission-ready Elsevier/SCI Highlights from manuscript text or extracted PDF/DOCX/TXT content. Use when a user needs 3-5 concise, evidence-grounded highlight bullets for a research paper, review, meta-analysis, case report, or bioinformatics manuscript.
- academic-norm-reviewDetects content similarity, verifies standardized citations and abbreviations, and flags potential academic integrity risks; use it before submission, during academic writing QA, or for compliance reviews.
- academic-poster-generatorComplete workflow for generating academic research posters from PDF literature; use when you need to extract paper content from PDFs and produce a LaTeX-based poster (beamerposter/tikzposter/baposter) with mandatory figure generation and a final rendered HTML deliverable.
- acronym-unpackerIntelligent medical abbreviation disambiguation tool that resolves ambiguous acronyms using clinical context, specialty-specific knowledge, and document-level semantic analysis.
- active-comparator-single-soc-faers-safety-comparisonGenerates complete FAERS pharmacovigilance study designs for multi-drug or class-level safety comparison inside one predefined SOC or AE family using active comparators, disproportionality analysis, subgroup characterization, and reviewer-facing evidence control.