gene-knowledge-integration

$npx mdskill add InternScience/scp/gene-knowledge-integration

This skill chains 3 public genomics/pharmacogenomics database APIs sequentially to build a comprehensive pharmacogenomics profile for a given gene.

SKILL.md

.github/skills/gene-knowledge-integrationView on GitHub ↗
---
name: gene-knowledge-integration
description: Given a gene symbol (e.g. TPMT), query 3 public databases (ClinGen CAR, PharmGKB, Monarch) to obtain gene registry info, FDA drug labels, clinical annotations, and gene-phenotype associations. Save all results into a JSON file.
license: MIT license
metadata:
    skill-author: PJLab
---

# Gene Knowledge Integration

## Usage

### 1. Tool Descriptions

This skill chains 3 public genomics/pharmacogenomics database APIs sequentially to build a comprehensive pharmacogenomics profile for a given gene.

**Tool 1: ClinGen CAR — Gene Registry Info**

```tex
Query ClinGen Allele Registry API to get gene registration information.
API: GET https://reg.genome.network/gene?HGNC.symbol={gene_symbol}
Args:
    gene_symbol (str): HGNC gene symbol (e.g. "TPMT")
Return:
    Gene record (dict): Contains @id (GN id), locus (genomic coordinates),
        externalRecords (HGNC id/name/symbol, NCBI gene id, MANE transcripts).
```

**Tool 2: PharmGKB (ClinPGx) — Gene Info, FDA Labels & Clinical Annotations**

```tex
Query PharmGKB ClinPGx API to get pharmacogenomics information.
API (gene):   GET https://api.clinpgx.org/v1/data/gene?symbol={gene_symbol}&view=base
API (labels): GET https://api.clinpgx.org/v1/data/label?source=fda&relatedGenes.symbol={gene_symbol}&view=base
API (clin):   GET https://api.clinpgx.org/v1/data/clinicalAnnotation?location.genes.symbol={gene_symbol}&view=base
Args:
    gene_symbol (str): HGNC gene symbol (e.g. "TPMT")
Return:
    gene: PharmGKB gene record with accession id, alternate names, cross-references.
    labels: FDA drug labels mentioning this gene (drug name, source, testing level).
    clinicalAnnotations: Clinical annotations linking genotype to phenotype
        (level of evidence, related chemicals, phenotype categories).
```

**Tool 3: Monarch Initiative — Gene-Phenotype Associations**

```tex
Query Monarch Initiative API to get gene-to-phenotype associations.
API: GET https://api-v3.monarchinitiative.org/v3/api/entity/{hgnc_id}/biolink:GeneToPhenotypicFeatureAssociation
Args:
    hgnc_id (str): HGNC identifier (e.g. "HGNC:12014" for TPMT)
Return:
    items (list): Each item contains subject (gene), object (phenotype HP term),
                  object_label (phenotype name), evidence_types, publications.
```

### 2. Gene Knowledge Integration

Query 3 databases (ClinGen CAR → PharmGKB → Monarch) for a given gene symbol, then save all results into a single JSON file `{gene_symbol}_knowledge.json`.

```python
import requests
import json
from datetime import datetime

gene_symbol = "TPMT"
results = {"query_gene": gene_symbol, "timestamp": datetime.now().isoformat()}

# ── Step 1: ClinGen CAR — 基因注册信息 ──
# 调用 ClinGen Allele Registry API,获取基因的 GN id、基因组坐标、
# HGNC/NCBI 外部记录和 MANE 转录本信息。
car_url = f"https://reg.genome.network/gene?HGNC.symbol={gene_symbol}"
car_resp = requests.get(car_url, headers={"Accept": "application/json"}, timeout=30)
car = car_resp.json()
results["clingen_car"] = car
hgnc_id = car.get("externalRecords", {}).get("HGNC", {}).get("id", "")
print(f"[ClinGen CAR] 基因={gene_symbol}, GN_id={car.get('@id','')}, HGNC={hgnc_id}")

# ── Step 2a: PharmGKB — 基因信息 ──
# 调用 PharmGKB ClinPGx API,获取基因的药物基因组学基本信息。
pgx_gene_url = f"https://api.clinpgx.org/v1/data/gene?symbol={gene_symbol}&view=base"
pgx_gene_resp = requests.get(pgx_gene_url, timeout=30)
pgx_gene = pgx_gene_resp.json()
results["pharmgkb_gene"] = pgx_gene
print(f"[PharmGKB] 基因信息获取成功")

# ── Step 2b: PharmGKB — FDA 药物标签 ──
# 查询与该基因相关的 FDA 药物标签,了解哪些药物的说明书提到了该基因。
pgx_label_url = (
    f"https://api.clinpgx.org/v1/data/label"
    f"?source=fda&relatedGenes.symbol={gene_symbol}&view=base"
)
pgx_labels_resp = requests.get(pgx_label_url, timeout=30)
pgx_labels = pgx_labels_resp.json()
results["pharmgkb_fda_labels"] = pgx_labels
print(f"[PharmGKB] FDA药物标签获取成功")

# ── Step 2c: PharmGKB — 临床注释 ──
# 查询该基因相关的临床注释,包含基因型-表型关联的证据等级。
pgx_clin_url = (
    f"https://api.clinpgx.org/v1/data/clinicalAnnotation"
    f"?location.genes.symbol={gene_symbol}&view=base"
)
pgx_clin_resp = requests.get(pgx_clin_url, timeout=30)
pgx_clin = pgx_clin_resp.json()
results["pharmgkb_clinical_annotations"] = pgx_clin
print(f"[PharmGKB] 临床注释获取成功")

# ── Step 3: Monarch — 基因表型关联 ──
# 调用 Monarch Initiative API,获取该基因关联的表型(HPO terms),
# 需要使用 Step 1 中获取的 HGNC id。
if hgnc_id:
    monarch_url = (
        f"https://api-v3.monarchinitiative.org/v3/api/entity/{hgnc_id}"
        f"/biolink:GeneToPhenotypicFeatureAssociation"
    )
    monarch_resp = requests.get(monarch_url, timeout=30)
    monarch = monarch_resp.json()
    items = monarch.get("items", [])
    results["monarch_phenotypes"] = {
        "association_count": len(items),
        "associations": items
    }
    phenotypes = [i.get("object_label", "") for i in items[:5]]
    print(f"[Monarch] 表型关联数={len(items)}, 前5个={phenotypes}")
else:
    results["monarch_phenotypes"] = {"error": "HGNC id not found from ClinGen CAR"}
    print("[Monarch] 跳过: 未获取到 HGNC id")

# ── 保存结果到 JSON 文件 ──
output_file = f"{gene_symbol}_knowledge.json"
with open(output_file, "w", encoding="utf-8") as f:
    json.dump(results, f, indent=2, ensure_ascii=False)
print(f"\n✓ 所有结果已保存: {output_file}")
```

More from InternScience/scp

SkillDescription
admet_druglikeness_reportADMET & Drug-Likeness Report - Generate comprehensive ADMET and drug-likeness report: molecular properties, H-bond analysis, hydrophobicity, topology, and ADMET prediction. Use this skill for medicinal chemistry tasks involving calculate mol basic info calculate mol hbond calculate mol hydrophobicity calculate mol topology pred molecule admet. Combines 5 tools from 2 SCP server(s).
affinity_maturationAffinity Maturation Pipeline - Affinity maturation: compute binding affinity, predict mutations, compute hydrophilicity, and predict drug-target interaction. Use this skill for antibody engineering tasks involving ComputeAffinityCalculator zero shot sequence prediction ComputeHydrophilicity PredictDrugTargetInteraction. Combines 4 tools from 3 SCP server(s).
alanine_scanning_pipelineAlanine Scanning Mutagenesis Pipeline - Alanine scanning: design scan, compute properties for each mutant, predict interactions, and compare. Use this skill for protein biochemistry tasks involving AlanineScanningDesigner ComputeProtPara PredictDrugTargetInteraction calculate protein sequence properties. Combines 4 tools from 3 SCP server(s).
aliphatic_ring_analysisRing System Analysis - Analyze ring systems: count aliphatic carbocycles, analyze aromaticity, compute topology, and structure complexity. Use this skill for organic chemistry tasks involving GetAliphaticCarbocyclesNum AromaticityAnalyzer calculate mol topology calculate mol structure complexity. Combines 4 tools from 3 SCP server(s).
alphafold_structure_pipelineAlphaFold Structure Analysis Pipeline - AlphaFold pipeline: download predicted structure, predict pockets, extract sequence, and compute properties. Use this skill for computational biology tasks involving download alphafold structure run fpocket extract pdb sequence calculate pdb basic info. Combines 4 tools from 3 SCP server(s).
antibody_drug_developmentAntibody Drug Development - Develop antibody drug: target protein analysis, biotherapeutic lookup, protein properties, and interaction prediction. Use this skill for biologics tasks involving get uniprotkb entry by accession get biotherapeutic by name ComputeProtPara ComputeHydrophilicity. Combines 4 tools from 3 SCP server(s).
antibody_target_analysisAntibody-Target Analysis - Analyze an antibody target: UniProt protein info, InterPro domains, protein properties, and biotherapeutic data from ChEMBL. Use this skill for immunology tasks involving get uniprotkb entry by accession query interpro ComputeProtPara get biotherapeutic by name. Combines 4 tools from 4 SCP server(s).
atc_drug_classificationATC Drug Classification Lookup - Look up drug in ATC classification: ChEMBL ATC class, FDA drug info, PubChem compound, and mechanism of action. Use this skill for pharmacology tasks involving get atc class by level5 get mechanism of action by drug name get compound by name get drug by name. Combines 4 tools from 3 SCP server(s).
atmospheric-science-calculationsCalculate atmospheric parameters including Coriolis parameter, geostrophic wind, heat index, potential temperature, and dewpoint for meteorology and climate science.
binding_site_characterizationBinding Site Characterization - Characterize binding sites: predict pockets with fpocket and P2Rank, get binding site info from ChEMBL, and visualize. Use this skill for structural biology tasks involving run fpocket pred pocket prank get binding site by id visualize protein. Combines 4 tools from 3 SCP server(s).