kegg-database

$npx mdskill add aipoch/medical-research-skills/kegg-database

Query KEGG databases directly for precise biological data.

  • Fetch pathways, genes, compounds, and drug interactions instantly.
  • Depends on the KEGG REST API for all data retrieval.
  • Executes specific HTTP commands for targeted biological lookups.
  • Returns structured JSON responses with cross-database mappings.

SKILL.md

.github/skills/kegg-databaseView on GitHub ↗
---
name: kegg-database
description: Direct access to KEGG via the REST API for academic-only pathway/gene/compound/drug queries; use when you need precise HTTP-level control or targeted KEGG ID mapping.
license: MIT
author: aipoch
---
> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)

## When to Use

- You need to fetch **KEGG pathway, gene, compound, enzyme, disease, or drug** records directly from the **KEGG REST API**.
- You want to perform **gene ↔ pathway** mapping (e.g., building inputs for pathway enrichment or reporting).
- You need **cross-references** between KEGG databases (e.g., pathway → genes, gene → KO, pathway → compounds).
- You must **convert identifiers** between KEGG and external databases (e.g., KEGG gene → NCBI Gene ID / UniProt; KEGG compound → PubChem).
- You need **drug–drug interaction (DDI)** lookups for KEGG drug IDs.

> Note: KEGG REST access is intended for academic use. Non-academic/commercial use may require a separate KEGG license.

## Key Features

- Full coverage of core KEGG REST operations via Python helpers:
  - `kegg_info` (database metadata)
  - `kegg_list` (catalog listing)
  - `kegg_find` (keyword/property search)
  - `kegg_get` (entry retrieval; sequences/structures/images)
  - `kegg_conv` (ID conversion)
  - `kegg_link` (cross-database linking)
  - `kegg_ddi` (drug–drug interactions)
- Supports common KEGG identifiers and formats:
  - Pathways: `map00010`, `hsa00010`
  - Genes: `hsa:10458`
  - Compounds: `cpd:C00002`
  - Drugs: `dr:D00001`
  - Enzymes: `ec:1.1.1.1`
  - KO: `ko:K00001`
- Output format options for `kegg_get`: `aaseq`, `ntseq`, `mol`, `kcf`, `image`, `kgml`, `json` (some formats are single-entry only).

## Dependencies

- Python `>=3.9`
- `requests >=2.31.0`

## Example Usage

```python
"""
End-to-end example:
1) Find a human gene by keyword
2) Link the gene to pathways
3) Retrieve one pathway entry
4) Convert the gene ID to UniProt
"""

from scripts.kegg_api import kegg_find, kegg_link, kegg_get, kegg_conv

# 1) Search for a gene keyword in KEGG GENES
hits = kegg_find("genes", "p53")
print("FIND results (first lines):")
print("\n".join(hits.splitlines()[:5]), "\n")

# Choose a known KEGG gene ID for TP53 (human)
gene_id = "hsa:7157"

# 2) Link gene -> pathways
pathway_links = kegg_link("pathway", gene_id)
print("LINK gene -> pathways (first lines):")
print("\n".join(pathway_links.splitlines()[:5]), "\n")

# Parse the first pathway ID from the link output
# Typical line format: path:hsaXXXXX<TAB>hsa:7157
first_line = next((ln for ln in pathway_links.splitlines() if ln.strip()), None)
if not first_line:
    raise RuntimeError("No pathways returned for the gene ID.")

path_id = first_line.split("\t")[0].replace("path:", "")
print("Selected pathway:", path_id, "\n")

# 3) Retrieve the pathway entry (flat text)
pathway_entry = kegg_get(path_id)
print("GET pathway entry (first 30 lines):")
print("\n".join(pathway_entry.splitlines()[:30]), "\n")

# 4) Convert KEGG gene ID -> UniProt
uniprot_map = kegg_conv("uniprot", gene_id)
print("CONV KEGG -> UniProt:")
print(uniprot_map)
```

## Implementation Details

### API-to-function mapping

This skill wraps KEGG REST endpoints into Python functions (see `scripts/kegg_api.py`):

- `kegg_info(database_or_org)`  
  Retrieves database or organism metadata (release info, counts, etc.).

- `kegg_list(database, organism=None)`  
  Lists entries in a database; optionally scoped to an organism (e.g., `("pathway", "hsa")`).  
  Also supports listing explicit IDs (batch-style) when passed as a single string.

- `kegg_find(database, query, option=None)`  
  Searches by keyword or by chemical properties. Common `option` values:
  - `formula` (exact match)
  - `exact_mass` (range like `300-310`)
  - `mol_weight` (range)

- `kegg_get(entry_ids, option=None)`  
  Retrieves full entries or specific formats:
  - Sequences: `aaseq`, `ntseq`
  - Structures: `mol`, `kcf`
  - Pathway assets: `image` (PNG), `kgml` (XML), `json` (Pathway JSON)

  **Batching rules**:
  - Most operations allow up to **10 entries** per request.
  - `image`, `kgml`, and `json` typically allow **only 1 entry** per request.

- `kegg_conv(target_db, source)`  
  Converts IDs between KEGG and external databases (e.g., `uniprot`, `ncbi-geneid`, `pubchem`, `chebi`).  
  Output is tab-delimited pairs: `source_id<TAB>target_id`.

- `kegg_link(target_db, source)`  
  Cross-references entries across KEGG databases (e.g., gene → pathway, pathway → compound, gene → KO).

- `kegg_ddi(drug_ids)`  
  Returns known drug–drug interactions for one or more KEGG drug IDs (up to typical batch limits).

### Practical constraints and error handling

- **Entry limits**: Prefer chunking lists into batches of ≤10 IDs; enforce single-entry calls for `image/kgml/json`.
- **HTTP status codes**: Treat non-200 responses as failures; common issues include:
  - `400` (bad request / malformed parameters)
  - `404` (unknown database or entry ID)
- **Rate behavior**: KEGG does not publish strict rate limits; avoid high-frequency polling and add backoff/retry for robustness.

### Reference documentation

For detailed endpoint syntax, database lists, and species codes, consult:
- `references/kegg_reference.md`

More from aipoch/medical-research-skills

SkillDescription
3d-molecule-ray-tracerGenerate photorealistic rendering scripts for PyMOL and UCSF ChimeraX.
abstract-summarizerTransform lengthy academic papers into concise, structured 250-word abstracts.
abstract-trimmerPrecision editing tool that reduces abstract word count through intelligent compression techniques, maintaining scientific rigor while meeting strict journal and conference requirements.
academic-abstract-refinerRefines long medical academic texts into SCI-style unstructured Chinese and English abstracts; use when you need to condense drafts/reports/summaries into bilingual abstracts and generate Summary_Report.md.
academic-cv-generatorGenerate structured academic CVs from free-form Chinese/English text and export to Word (.docx). Use this skill when you are asked to organize, generate, or optimize an academic CV (e.g., publications/projects/awards) into a consistent, formatted document with uniform-colored section headers and optional bilingual output.
academic-highlight-generatorGenerates submission-ready Elsevier/SCI Highlights from manuscript text or extracted PDF/DOCX/TXT content. Use when a user needs 3-5 concise, evidence-grounded highlight bullets for a research paper, review, meta-analysis, case report, or bioinformatics manuscript.
academic-norm-reviewDetects content similarity, verifies standardized citations and abbreviations, and flags potential academic integrity risks; use it before submission, during academic writing QA, or for compliance reviews.
academic-poster-generatorComplete workflow for generating academic research posters from PDF literature; use when you need to extract paper content from PDFs and produce a LaTeX-based poster (beamerposter/tikzposter/baposter) with mandatory figure generation and a final rendered HTML deliverable.
acronym-unpackerIntelligent medical abbreviation disambiguation tool that resolves ambiguous acronyms using clinical context, specialty-specific knowledge, and document-level semantic analysis.
active-comparator-single-soc-faers-safety-comparisonGenerates complete FAERS pharmacovigilance study designs for multi-drug or class-level safety comparison inside one predefined SOC or AE family using active comparators, disproportionality analysis, subgroup characterization, and reviewer-facing evidence control.