comprehensive-variant-annotation
$
npx mdskill add InternScience/scp/comprehensive-variant-annotationThis skill chains 7 public genomics database APIs sequentially to build a comprehensive annotation for a given variant. Use this when the user asks a general/vague question like "帮我查一下 rs7412" or "tell me about rs7412".
SKILL.md
.github/skills/comprehensive-variant-annotationView on GitHub ↗
---
name: comprehensive-variant-annotation
description: "Given an rsID, query multiple databases (dbSNP, FAVOR, GWAS Catalog, ClinVar, gnomAD, PharmGKB, ClinGen) for comprehensive annotation. Use when user asks a general question about a variant without specifying which aspect."
license: MIT license
metadata:
skill-author: PJLab
---
# Comprehensive Variant Annotation
## Usage
### 1. Tool Descriptions
This skill chains 7 public genomics database APIs sequentially to build a comprehensive annotation for a given variant. Use this when the user asks a general/vague question like "帮我查一下 rs7412" or "tell me about rs7412".
**Tool 1: dbSNP — Variant Basic Info & ClinVar RCV Records**
```tex
Query NCBI dbSNP REST API to get SNP basic information and ClinVar clinical records.
API: GET https://api.ncbi.nlm.nih.gov/variation/v0/refsnp/{rsid_number}
Args:
rs_id (str): dbSNP rsID (e.g. "rs7412")
Return:
primary_snapshot_data: allele_annotations (gene associations, functional impact,
ClinVar RCV clinical records), placements_with_allele (GRCh38 coordinates).
```
**Tool 2: FAVOR — Functional Annotation & Scores**
```tex
Query FAVOR (GenoHub) API to get functional annotation and conservation scores.
API: GET https://api.genohub.org/v1/rsids/{rsid}
Return:
Functional prediction scores (CADD, REVEL, etc.), conservation scores,
gene annotations, variant effect predictions, and variant_id for gnomAD.
```
**Tool 3: gnomAD — Population Allele Frequency**
```tex
Query gnomAD GraphQL API for population allele frequencies.
API: POST https://gnomad.broadinstitute.org/api
Note: Requires variant_id (chr-pos-ref-alt format) from FAVOR Step 2.
Return:
Allele frequencies across populations (global, AFR, AMR, ASJ, EAS, FIN, NFE, SAS, etc.)
```
**Tool 4: GWAS Catalog — Trait Associations**
```tex
Query EBI GWAS Catalog REST API for GWAS statistical associations.
API: GET https://www.ebi.ac.uk/gwas/rest/api/associations/search/findByRsId?rsId={rsid}
Return:
associations: pvalue, risk allele, associated trait/disease, source study.
```
**Tool 5: ClinVar — Clinical Pathogenicity (extracted from dbSNP Step 1)**
```tex
Extract ClinVar RCV records from dbSNP response (already obtained in Step 1).
Return:
Clinical significance (Pathogenic/Benign/VUS/drug-response),
review status, associated diseases, RCV accession numbers.
```
**Tool 6: PharmGKB — Pharmacogenomic Annotations**
```tex
Query PharmGKB clinPGx API for drug-gene-variant interactions.
API: GET https://api.clinpgx.org/v1/data/clinicalAnnotation?location.fingerprint={rsid}&view=base
Return:
Related drugs, evidence level (1A-4), related diseases, annotation types.
```
**Tool 7: ClinGen — Cross-Database ID Mapping**
```tex
Query ClinGen Allele Registry for cross-database identifiers.
API: GET https://reg.genome.network/alleles?dbSNP.rs={rs_id}
Headers: Accept: application/json
Return:
CA ID, ClinVar IDs, COSMIC ID, gnomAD IDs, external cross-references.
```
### 2. Comprehensive Variant Annotation
Query 7 databases for a given rsID, then save all results into a single JSON file `{rsID}_annotation.json`.
```python
import requests
import json
from datetime import datetime
rs_id = "rs7412"
results = {"query_rsid": rs_id, "timestamp": datetime.now().isoformat()}
def safe_request(name, func, fallback=None):
"""统一的容错请求包装器。任一数据库超时/报错不会中断整个流程。"""
try:
return func()
except requests.exceptions.Timeout:
print(f"[{name}] ⚠ 连接超时,跳过")
results.setdefault("errors", {})[name] = "timeout"
return fallback
except Exception as e:
print(f"[{name}] ⚠ 请求失败: {e},跳过")
results.setdefault("errors", {})[name] = str(e)
return fallback
# ── Step 1: dbSNP — 变异基本信息 + ClinVar RCV ──
rsid_num = rs_id.replace("rs", "")
dbsnp_url = f"https://api.ncbi.nlm.nih.gov/variation/v0/refsnp/{rsid_num}"
dbsnp = safe_request("dbSNP",
lambda: requests.get(dbsnp_url, timeout=30).json(), fallback={})
results["dbsnp"] = dbsnp
print(f"[dbSNP] {rs_id} 查询{'成功' if dbsnp else '失败'}")
# 从 dbSNP 提取 ClinVar RCV 记录(Step 5)
snapshot = dbsnp.get("primary_snapshot_data", {})
clinvar_records = []
for ann in snapshot.get("allele_annotations", []):
for clin in ann.get("clinical", []):
clinvar_records.append({
"accession": clin.get("accession_version", ""),
"significances": clin.get("clinical_significances", []),
"diseases": clin.get("disease_names", []),
"review_status": clin.get("review_status", "")
})
results["clinvar"] = clinvar_records
print(f"[ClinVar] RCV 记录数: {len(clinvar_records)}")
# ── Step 2: FAVOR — 功能注释与评分 ──
favor_url = f"https://api.genohub.org/v1/rsids/{rs_id}"
favor = safe_request("FAVOR",
lambda: requests.get(favor_url, timeout=30).json(), fallback={})
results["favor"] = favor
print(f"[FAVOR] 功能注释{'获取成功' if favor else '获取失败'}")
# 从 FAVOR 提取 variant_id 用于 gnomAD 查询
variant_id = None
favor_results = favor if isinstance(favor, list) else favor.get("results", [favor]) if favor else []
if favor_results and isinstance(favor_results, list):
first = favor_results[0] if favor_results else {}
chrom = str(first.get("chromosome", ""))
pos = str(first.get("position", ""))
ref = first.get("ref", "")
alt = first.get("alt", "")
if chrom and pos and ref and alt:
variant_id = f"{chrom}-{pos}-{ref}-{alt}"
# ── Step 3: gnomAD — 人群等位基因频率 ──
if variant_id:
gnomad_query = """
query($variantId: String!) {
variant(variantId: $variantId, dataset: gnomad_r4) {
variant_id
genome {
ac
an
af
populations { id ac an af }
}
}
}
"""
gnomad_data = safe_request("gnomAD",
lambda: requests.post(
"https://gnomad.broadinstitute.org/api",
json={"query": gnomad_query, "variables": {"variantId": variant_id}},
timeout=30
).json(), fallback={})
results["gnomad"] = (gnomad_data or {}).get("data", {}).get("variant", {})
genome = results["gnomad"].get("genome", {}) if results["gnomad"] else {}
print(f"[gnomAD] AF={genome.get('af', 'N/A')}")
else:
results["gnomad"] = {"error": "无法从 FAVOR 提取 variant_id"}
print("[gnomAD] 跳过: 无 variant_id")
# ── Step 4: GWAS Catalog — 关联表型 ──
gwas_url = f"https://www.ebi.ac.uk/gwas/rest/api/associations/search/findByRsId?rsId={rs_id}"
gwas = safe_request("GWAS Catalog",
lambda: requests.get(gwas_url, headers={"Accept": "application/json"}, timeout=30).json(),
fallback={})
associations = (gwas or {}).get("_embedded", {}).get("associations", [])
results["gwas_catalog"] = {
"association_count": len(associations),
"associations": associations
}
print(f"[GWAS Catalog] 关联数={len(associations)}")
# ── Step 6: PharmGKB — 药物基因组注释 ──
pgx_url = f"https://api.clinpgx.org/v1/data/clinicalAnnotation?location.fingerprint={rs_id}&view=base"
pgx_resp = safe_request("PharmGKB",
lambda: requests.get(pgx_url, timeout=30).json(), fallback=[])
pgx_annotations = pgx_resp if isinstance(pgx_resp, list) else (pgx_resp or {}).get("data", [])
results["pharmgkb"] = pgx_annotations
print(f"[PharmGKB] 药物基因组注释数: {len(pgx_annotations)}")
# ── Step 7: ClinGen — 跨数据库 ID 映射 ──
# 注意: ClinGen Allele Registry 服务器可能响应较慢,使用 60s 超时 + 重试
clingen_url = f"https://reg.genome.network/alleles?dbSNP.rs={rs_id}"
clingen_resp = None
for attempt in range(2): # 最多重试 1 次
clingen_resp = safe_request("ClinGen",
lambda: requests.get(clingen_url,
headers={"Accept": "application/json"}, timeout=60).json(),
fallback=None)
if clingen_resp is not None:
break
print(f"[ClinGen] 第 {attempt+1} 次尝试失败,重试中...")
if clingen_resp is not None:
if not isinstance(clingen_resp, list):
clingen_resp = [clingen_resp]
# 过滤同义变异
clingen_alleles = []
for allele in clingen_resp:
titles = allele.get("communityStandardTitle", [])
if titles and any("=" in t for t in titles):
continue
clingen_alleles.append({
"ca_id": allele.get("@id", "").split("/")[-1],
"title": titles,
"externalRecords": allele.get("externalRecords", {})
})
results["clingen"] = clingen_alleles
print(f"[ClinGen] 等位基因数: {len(clingen_alleles)}")
else:
results["clingen"] = {"error": "ClinGen API 连接超时,可稍后单独使用 variant-cross-database-ids skill 重试"}
print("[ClinGen] ⚠ 所有尝试均超时,已跳过")
# ── 保存结果到 JSON 文件 ──
output_file = f"{rs_id}_annotation.json"
with open(output_file, "w", encoding="utf-8") as f:
json.dump(results, f, indent=2, ensure_ascii=False)
# 汇总报告
errors = results.get("errors", {})
if errors:
print(f"\n⚠ 以下数据库查询失败: {list(errors.keys())},其余数据库结果正常")
print(f"✓ 结果已保存: {output_file}")
```
More from InternScience/scp
- admet_druglikeness_reportADMET & Drug-Likeness Report - Generate comprehensive ADMET and drug-likeness report: molecular properties, H-bond analysis, hydrophobicity, topology, and ADMET prediction. Use this skill for medicinal chemistry tasks involving calculate mol basic info calculate mol hbond calculate mol hydrophobicity calculate mol topology pred molecule admet. Combines 5 tools from 2 SCP server(s).
- affinity_maturationAffinity Maturation Pipeline - Affinity maturation: compute binding affinity, predict mutations, compute hydrophilicity, and predict drug-target interaction. Use this skill for antibody engineering tasks involving ComputeAffinityCalculator zero shot sequence prediction ComputeHydrophilicity PredictDrugTargetInteraction. Combines 4 tools from 3 SCP server(s).
- alanine_scanning_pipelineAlanine Scanning Mutagenesis Pipeline - Alanine scanning: design scan, compute properties for each mutant, predict interactions, and compare. Use this skill for protein biochemistry tasks involving AlanineScanningDesigner ComputeProtPara PredictDrugTargetInteraction calculate protein sequence properties. Combines 4 tools from 3 SCP server(s).
- aliphatic_ring_analysisRing System Analysis - Analyze ring systems: count aliphatic carbocycles, analyze aromaticity, compute topology, and structure complexity. Use this skill for organic chemistry tasks involving GetAliphaticCarbocyclesNum AromaticityAnalyzer calculate mol topology calculate mol structure complexity. Combines 4 tools from 3 SCP server(s).
- alphafold_structure_pipelineAlphaFold Structure Analysis Pipeline - AlphaFold pipeline: download predicted structure, predict pockets, extract sequence, and compute properties. Use this skill for computational biology tasks involving download alphafold structure run fpocket extract pdb sequence calculate pdb basic info. Combines 4 tools from 3 SCP server(s).
- antibody_drug_developmentAntibody Drug Development - Develop antibody drug: target protein analysis, biotherapeutic lookup, protein properties, and interaction prediction. Use this skill for biologics tasks involving get uniprotkb entry by accession get biotherapeutic by name ComputeProtPara ComputeHydrophilicity. Combines 4 tools from 3 SCP server(s).
- antibody_target_analysisAntibody-Target Analysis - Analyze an antibody target: UniProt protein info, InterPro domains, protein properties, and biotherapeutic data from ChEMBL. Use this skill for immunology tasks involving get uniprotkb entry by accession query interpro ComputeProtPara get biotherapeutic by name. Combines 4 tools from 4 SCP server(s).
- atc_drug_classificationATC Drug Classification Lookup - Look up drug in ATC classification: ChEMBL ATC class, FDA drug info, PubChem compound, and mechanism of action. Use this skill for pharmacology tasks involving get atc class by level5 get mechanism of action by drug name get compound by name get drug by name. Combines 4 tools from 3 SCP server(s).
- atmospheric-science-calculationsCalculate atmospheric parameters including Coriolis parameter, geostrophic wind, heat index, potential temperature, and dewpoint for meteorology and climate science.
- binding_site_characterizationBinding Site Characterization - Characterize binding sites: predict pockets with fpocket and P2Rank, get binding site info from ChEMBL, and visualize. Use this skill for structural biology tasks involving run fpocket pred pocket prank get binding site by id visualize protein. Combines 4 tools from 3 SCP server(s).