kegg-database

Name: kegg-database
Author: aipoch/medical-research-skills

$npx mdskill add aipoch/medical-research-skills/kegg-database

Query KEGG databases directly for precise biological data.

Fetch pathways, genes, compounds, and drug interactions instantly.
Depends on the KEGG REST API for all data retrieval.
Executes specific HTTP commands for targeted biological lookups.
Returns structured JSON responses with cross-database mappings.

SKILL.md

.github/skills/kegg-databaseView on GitHub ↗

---
name: kegg-database
description: Direct access to KEGG via the REST API for academic-only pathway/gene/compound/drug queries; use when you need precise HTTP-level control or targeted KEGG ID mapping.
license: MIT
author: aipoch
---
> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)

## When to Use

- You need to fetch **KEGG pathway, gene, compound, enzyme, disease, or drug** records directly from the **KEGG REST API**.
- You want to perform **gene ↔ pathway** mapping (e.g., building inputs for pathway enrichment or reporting).
- You need **cross-references** between KEGG databases (e.g., pathway → genes, gene → KO, pathway → compounds).
- You must **convert identifiers** between KEGG and external databases (e.g., KEGG gene → NCBI Gene ID / UniProt; KEGG compound → PubChem).
- You need **drug–drug interaction (DDI)** lookups for KEGG drug IDs.

> Note: KEGG REST access is intended for academic use. Non-academic/commercial use may require a separate KEGG license.

## Key Features

- Full coverage of core KEGG REST operations via Python helpers:
  - `kegg_info` (database metadata)
  - `kegg_list` (catalog listing)
  - `kegg_find` (keyword/property search)
  - `kegg_get` (entry retrieval; sequences/structures/images)
  - `kegg_conv` (ID conversion)
  - `kegg_link` (cross-database linking)
  - `kegg_ddi` (drug–drug interactions)
- Supports common KEGG identifiers and formats:
  - Pathways: `map00010`, `hsa00010`
  - Genes: `hsa:10458`
  - Compounds: `cpd:C00002`
  - Drugs: `dr:D00001`
  - Enzymes: `ec:1.1.1.1`
  - KO: `ko:K00001`
- Output format options for `kegg_get`: `aaseq`, `ntseq`, `mol`, `kcf`, `image`, `kgml`, `json` (some formats are single-entry only).

## Dependencies

- Python `>=3.9`
- `requests >=2.31.0`

## Example Usage

```python
"""
End-to-end example:
1) Find a human gene by keyword
2) Link the gene to pathways
3) Retrieve one pathway entry
4) Convert the gene ID to UniProt
"""

from scripts.kegg_api import kegg_find, kegg_link, kegg_get, kegg_conv

# 1) Search for a gene keyword in KEGG GENES
hits = kegg_find("genes", "p53")
print("FIND results (first lines):")
print("\n".join(hits.splitlines()[:5]), "\n")

# Choose a known KEGG gene ID for TP53 (human)
gene_id = "hsa:7157"

# 2) Link gene -> pathways
pathway_links = kegg_link("pathway", gene_id)
print("LINK gene -> pathways (first lines):")
print("\n".join(pathway_links.splitlines()[:5]), "\n")

# Parse the first pathway ID from the link output
# Typical line format: path:hsaXXXXX<TAB>hsa:7157
first_line = next((ln for ln in pathway_links.splitlines() if ln.strip()), None)
if not first_line:
    raise RuntimeError("No pathways returned for the gene ID.")

path_id = first_line.split("\t")[0].replace("path:", "")
print("Selected pathway:", path_id, "\n")

# 3) Retrieve the pathway entry (flat text)
pathway_entry = kegg_get(path_id)
print("GET pathway entry (first 30 lines):")
print("\n".join(pathway_entry.splitlines()[:30]), "\n")

# 4) Convert KEGG gene ID -> UniProt
uniprot_map = kegg_conv("uniprot", gene_id)
print("CONV KEGG -> UniProt:")
print(uniprot_map)
```

## Implementation Details

### API-to-function mapping

This skill wraps KEGG REST endpoints into Python functions (see `scripts/kegg_api.py`):

- `kegg_info(database_or_org)`  
  Retrieves database or organism metadata (release info, counts, etc.).

- `kegg_list(database, organism=None)`  
  Lists entries in a database; optionally scoped to an organism (e.g., `("pathway", "hsa")`).  
  Also supports listing explicit IDs (batch-style) when passed as a single string.

- `kegg_find(database, query, option=None)`  
  Searches by keyword or by chemical properties. Common `option` values:
  - `formula` (exact match)
  - `exact_mass` (range like `300-310`)
  - `mol_weight` (range)

- `kegg_get(entry_ids, option=None)`  
  Retrieves full entries or specific formats:
  - Sequences: `aaseq`, `ntseq`
  - Structures: `mol`, `kcf`
  - Pathway assets: `image` (PNG), `kgml` (XML), `json` (Pathway JSON)

  **Batching rules**:
  - Most operations allow up to **10 entries** per request.
  - `image`, `kgml`, and `json` typically allow **only 1 entry** per request.

- `kegg_conv(target_db, source)`  
  Converts IDs between KEGG and external databases (e.g., `uniprot`, `ncbi-geneid`, `pubchem`, `chebi`).  
  Output is tab-delimited pairs: `source_id<TAB>target_id`.

- `kegg_link(target_db, source)`  
  Cross-references entries across KEGG databases (e.g., gene → pathway, pathway → compound, gene → KO).

- `kegg_ddi(drug_ids)`  
  Returns known drug–drug interactions for one or more KEGG drug IDs (up to typical batch limits).

### Practical constraints and error handling

- **Entry limits**: Prefer chunking lists into batches of ≤10 IDs; enforce single-entry calls for `image/kgml/json`.
- **HTTP status codes**: Treat non-200 responses as failures; common issues include:
  - `400` (bad request / malformed parameters)
  - `404` (unknown database or entry ID)
- **Rate behavior**: KEGG does not publish strict rate limits; avoid high-frequency polling and add backoff/retry for robustness.

### Reference documentation

For detailed endpoint syntax, database lists, and species codes, consult:
- `references/kegg_reference.md`

More from aipoch/medical-research-skills

Skill	Description
3d-molecule-ray-tracer	Generate photorealistic rendering scripts for PyMOL and UCSF ChimeraX.
abstract-summarizer	Transform lengthy academic papers into concise, structured 250-word abstracts.
abstract-trimmer	Precision editing tool that reduces abstract word count through intelligent compression techniques, maintaining scientific rigor while meeting strict journal and conference requirements.
academic-abstract-refiner	Refines long medical academic texts into SCI-style unstructured Chinese and English abstracts; use when you need to condense drafts/reports/summaries into bilingual abstracts and generate Summary_Report.md.
academic-cv-generator	Generate structured academic CVs from free-form Chinese/English text and export to Word (.docx). Use this skill when you are asked to organize, generate, or optimize an academic CV (e.g., publications/projects/awards) into a consistent, formatted document with uniform-colored section headers and optional bilingual output.
academic-highlight-generator	Generates submission-ready Elsevier/SCI Highlights from manuscript text or extracted PDF/DOCX/TXT content. Use when a user needs 3-5 concise, evidence-grounded highlight bullets for a research paper, review, meta-analysis, case report, or bioinformatics manuscript.
academic-norm-review	Detects content similarity, verifies standardized citations and abbreviations, and flags potential academic integrity risks; use it before submission, during academic writing QA, or for compliance reviews.
academic-poster-generator	Complete workflow for generating academic research posters from PDF literature; use when you need to extract paper content from PDFs and produce a LaTeX-based poster (beamerposter/tikzposter/baposter) with mandatory figure generation and a final rendered HTML deliverable.
acronym-unpacker	Intelligent medical abbreviation disambiguation tool that resolves ambiguous acronyms using clinical context, specialty-specific knowledge, and document-level semantic analysis.
active-comparator-single-soc-faers-safety-comparison	Generates complete FAERS pharmacovigilance study designs for multi-drug or class-level safety comparison inside one predefined SOC or AE family using active comparators, disproportionality analysis, subgroup characterization, and reviewer-facing evidence control.