string-database

Name: string-database
Author: aipoch/medical-research-skills

$npx mdskill add aipoch/medical-research-skills/string-database

Resolve identifiers and retrieve protein networks with STRING.

Map gene symbols to protein identifiers for downstream analysis.
Fetch interaction edges with confidence scores from STRING.
Expand candidate lists by retrieving interaction partners.
Deliver static network images for reports or notebooks.

SKILL.md

.github/skills/string-databaseView on GitHub ↗

---
name: string-database
description: Access the STRING database to map identifiers, retrieve protein–protein interaction networks, and run functional/PPI enrichment when you need interaction context for a gene/protein set.
license: MIT
author: aipoch
---
> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)

## When to Use

- You have gene symbols (e.g., `TP53`) and need to resolve them to STRING protein identifiers for downstream analysis.
- You want to retrieve a protein–protein interaction (PPI) network (functional/physical) with confidence scores for one or more proteins.
- You need to find interaction partners for a target protein to expand a candidate list (e.g., add top N neighbors).
- You want to perform functional enrichment (GO/KEGG/Reactome, etc.) for a protein set to interpret biological themes.
- You need a quick static visualization (PNG/SVG) of a STRING network for reports or notebooks.

## Key Features

- **ID Mapping**: Convert gene/protein names to STRING identifiers for a given organism.
- **Network Retrieval**: Fetch interaction edges with confidence scores from STRING.
- **Interaction Partners**: Expand a protein list by retrieving interaction partners.
- **Enrichment Analysis**:
  - Functional enrichment (e.g., GO, KEGG, Reactome)
  - PPI enrichment statistics
  - Functional annotations (e.g., PFAM/SMART where supported by STRING endpoints)
- **Visualization**: Download static network images (PNG/SVG).

## Dependencies

- Python `>=3.8`
- `requests` (tested with `>=2.28`)
- `pandas` (tested with `>=1.5`)

Install:

```bash
pip install requests pandas
```

## Example Usage

```python
from scripts.string_api import StringClient

def main():
    # STRING does not require a secret API key, but providing a caller identity is recommended.
    client = StringClient(caller_identity="my_analysis_tool")

    # 1) Map an identifier (e.g., TP53 in Homo sapiens; NCBI taxonomy ID 9606)
    protein_id = client.map_id(identifier="TP53", species=9606)
    print("Mapped ID:", protein_id)

    # 2) Download a network image and expand by adding interaction partners
    client.get_network_image(
        identifiers=[protein_id],
        output_file="tp53_network.png",
        add_color_nodes=10,  # add 10 partners
    )
    print("Saved network image to tp53_network.png")

    # 3) Run PPI enrichment for the set
    ppi_stats = client.get_ppi_enrichment(identifiers=[protein_id])
    print("PPI enrichment:", ppi_stats)

if __name__ == "__main__":
    main()
```

## Implementation Details

- **Client entry point**: `scripts/string_api.py` provides the main wrapper (e.g., `StringClient`) around the STRING REST API.
- **Caller identity**:
  - STRING endpoints do **not** require an API key.
  - A `caller_identity` string is strongly recommended (project name/email/URL) to support rate/load management.
  - Pass it at initialization (e.g., `StringClient(caller_identity="my_email@example.com")`) or inject via environment variables in your own wrapper.
- **Organism selection**:
  - Most operations require a species identifier (commonly NCBI taxonomy ID, e.g., `9606` for human).
- **Network retrieval and scoring**:
  - Network endpoints return interactions with confidence scores; downstream filtering is typically done by applying a score threshold in your analysis code (if exposed by the wrapper).
- **Visualization**:
  - Static images are retrieved directly from STRING image endpoints and written to disk (PNG/SVG depending on the method/parameters).
- **Reference documentation**:
  - See `references/string_reference.md` for original API notes and endpoint details included with this skill.

More from aipoch/medical-research-skills