string-database
$
npx mdskill add aipoch/medical-research-skills/string-databaseResolve identifiers and retrieve protein networks with STRING.
- Map gene symbols to protein identifiers for downstream analysis.
- Fetch interaction edges with confidence scores from STRING.
- Expand candidate lists by retrieving interaction partners.
- Deliver static network images for reports or notebooks.
SKILL.md
.github/skills/string-databaseView on GitHub ↗
---
name: string-database
description: Access the STRING database to map identifiers, retrieve protein–protein interaction networks, and run functional/PPI enrichment when you need interaction context for a gene/protein set.
license: MIT
author: aipoch
---
> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)
## When to Use
- You have gene symbols (e.g., `TP53`) and need to resolve them to STRING protein identifiers for downstream analysis.
- You want to retrieve a protein–protein interaction (PPI) network (functional/physical) with confidence scores for one or more proteins.
- You need to find interaction partners for a target protein to expand a candidate list (e.g., add top N neighbors).
- You want to perform functional enrichment (GO/KEGG/Reactome, etc.) for a protein set to interpret biological themes.
- You need a quick static visualization (PNG/SVG) of a STRING network for reports or notebooks.
## Key Features
- **ID Mapping**: Convert gene/protein names to STRING identifiers for a given organism.
- **Network Retrieval**: Fetch interaction edges with confidence scores from STRING.
- **Interaction Partners**: Expand a protein list by retrieving interaction partners.
- **Enrichment Analysis**:
- Functional enrichment (e.g., GO, KEGG, Reactome)
- PPI enrichment statistics
- Functional annotations (e.g., PFAM/SMART where supported by STRING endpoints)
- **Visualization**: Download static network images (PNG/SVG).
## Dependencies
- Python `>=3.8`
- `requests` (tested with `>=2.28`)
- `pandas` (tested with `>=1.5`)
Install:
```bash
pip install requests pandas
```
## Example Usage
```python
from scripts.string_api import StringClient
def main():
# STRING does not require a secret API key, but providing a caller identity is recommended.
client = StringClient(caller_identity="my_analysis_tool")
# 1) Map an identifier (e.g., TP53 in Homo sapiens; NCBI taxonomy ID 9606)
protein_id = client.map_id(identifier="TP53", species=9606)
print("Mapped ID:", protein_id)
# 2) Download a network image and expand by adding interaction partners
client.get_network_image(
identifiers=[protein_id],
output_file="tp53_network.png",
add_color_nodes=10, # add 10 partners
)
print("Saved network image to tp53_network.png")
# 3) Run PPI enrichment for the set
ppi_stats = client.get_ppi_enrichment(identifiers=[protein_id])
print("PPI enrichment:", ppi_stats)
if __name__ == "__main__":
main()
```
## Implementation Details
- **Client entry point**: `scripts/string_api.py` provides the main wrapper (e.g., `StringClient`) around the STRING REST API.
- **Caller identity**:
- STRING endpoints do **not** require an API key.
- A `caller_identity` string is strongly recommended (project name/email/URL) to support rate/load management.
- Pass it at initialization (e.g., `StringClient(caller_identity="my_email@example.com")`) or inject via environment variables in your own wrapper.
- **Organism selection**:
- Most operations require a species identifier (commonly NCBI taxonomy ID, e.g., `9606` for human).
- **Network retrieval and scoring**:
- Network endpoints return interactions with confidence scores; downstream filtering is typically done by applying a score threshold in your analysis code (if exposed by the wrapper).
- **Visualization**:
- Static images are retrieved directly from STRING image endpoints and written to disk (PNG/SVG depending on the method/parameters).
- **Reference documentation**:
- See `references/string_reference.md` for original API notes and endpoint details included with this skill.More from aipoch/medical-research-skills
- 3d-molecule-ray-tracerGenerate photorealistic rendering scripts for PyMOL and UCSF ChimeraX.
- abstract-summarizerTransform lengthy academic papers into concise, structured 250-word abstracts.
- abstract-trimmerPrecision editing tool that reduces abstract word count through intelligent compression techniques, maintaining scientific rigor while meeting strict journal and conference requirements.
- academic-abstract-refinerRefines long medical academic texts into SCI-style unstructured Chinese and English abstracts; use when you need to condense drafts/reports/summaries into bilingual abstracts and generate Summary_Report.md.
- academic-cv-generatorGenerate structured academic CVs from free-form Chinese/English text and export to Word (.docx). Use this skill when you are asked to organize, generate, or optimize an academic CV (e.g., publications/projects/awards) into a consistent, formatted document with uniform-colored section headers and optional bilingual output.
- academic-highlight-generatorGenerates submission-ready Elsevier/SCI Highlights from manuscript text or extracted PDF/DOCX/TXT content. Use when a user needs 3-5 concise, evidence-grounded highlight bullets for a research paper, review, meta-analysis, case report, or bioinformatics manuscript.
- academic-norm-reviewDetects content similarity, verifies standardized citations and abbreviations, and flags potential academic integrity risks; use it before submission, during academic writing QA, or for compliance reviews.
- academic-poster-generatorComplete workflow for generating academic research posters from PDF literature; use when you need to extract paper content from PDFs and produce a LaTeX-based poster (beamerposter/tikzposter/baposter) with mandatory figure generation and a final rendered HTML deliverable.
- acronym-unpackerIntelligent medical abbreviation disambiguation tool that resolves ambiguous acronyms using clinical context, specialty-specific knowledge, and document-level semantic analysis.
- active-comparator-single-soc-faers-safety-comparisonGenerates complete FAERS pharmacovigilance study designs for multi-drug or class-level safety comparison inside one predefined SOC or AE family using active comparators, disproportionality analysis, subgroup characterization, and reviewer-facing evidence control.