hmdb-database

$npx mdskill add aipoch/medical-research-skills/hmdb-database

Query metabolites and extract clinical chemical data from HMDB.

  • Retrieves metabolite details using names, IDs, or chemical structures.
  • Depends on the Human Metabolome Database XML dump and parser.
  • Selects fields based on requested chemical biological or clinical categories.
  • Delivers structured JSON containing formula pathways and disease links.

SKILL.md

.github/skills/hmdb-databaseView on GitHub ↗
---
name: hmdb-database
description: Access the Human Metabolome Database (HMDB) to search metabolites by name/structure/ID and extract chemical/biological/clinical fields when you need metabolomics research data or automated HMDB XML mining.
license: MIT
author: aipoch
---
> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)

## When to Use

- You need to look up a metabolite by **common name** (e.g., “Caffeine”) and retrieve its HMDB entry data.
- You have an **HMDB ID** (e.g., `HMDB0000001`) and want to extract standardized chemical/biological/clinical fields for downstream analysis.
- You want to build a **local, scriptable pipeline** to mine the HMDB XML dump instead of manually browsing the website.
- You need to **map HMDB identifiers** to external resources (e.g., KEGG, PubChem, ChEBI) for integration tasks.
- You are preparing metabolomics datasets and need **pathway/enzyme/transporter** annotations from HMDB entries.

## Key Features

- Search metabolites by:
  - Text name
  - HMDB identifier (e.g., `HMDB0000001`)
  - Structure-related query (as supported by the parser/search implementation)
- Parse the HMDB XML dataset and extract:
  - **Chemical data** (formula, molecular weight, InChI/SMILES where available)
  - **Biological data** (pathways, enzymes, transporters)
  - **Clinical data** (disease associations, biofluid concentrations)
- Optional structuring of extracted results for analysis workflows (e.g., tabular outputs).
- Supports integration workflows by exposing identifiers suitable for cross-database mapping.

## Dependencies

- Python `>=3.9`
- Standard library:
  - `xml.etree.ElementTree` (built-in)
- Optional:
  - `pandas >= 1.5`

## Example Usage

### 1) Download HMDB XML

Download the HMDB metabolite XML dataset from:
- https://hmdb.ca/downloads

Assume you saved it as:

```text
data/hmdb_metabolites.xml
```

### 2) Search and Extract Fields (Runnable Example)

```python
from scripts.hmdb_parser import HMDBParser

def main():
    # Path to the HMDB XML dump downloaded from hmdb.ca/downloads
    xml_path = "data/hmdb_metabolites.xml"

    parser = HMDBParser(xml_path)

    # Search by metabolite name (text query)
    results = parser.search("Caffeine")

    # Print basic information from the first match (structure depends on implementation)
    if not results:
        print("No results found.")
        return

    first = results[0]
    print("Top match:")
    print(first)

if __name__ == "__main__":
    main()
```

### 3) Field Reference

For a curated list of extractable fields and how they map to HMDB XML elements, see:

- `references/hmdb_data_fields.md`

## Implementation Details

- **Data acquisition**
  - Primary workflow uses the official HMDB downloadable XML dataset (recommended for bulk parsing).
  - Single-entry lookups can be done via the HMDB website, but this skill is designed around XML parsing.

- **Parsing approach**
  - The parser reads the HMDB XML and traverses metabolite entries using `xml.etree.ElementTree`.
  - Extracted fields should follow the definitions documented in `references/hmdb_data_fields.md`.

- **Search behavior**
  - Name/ID search typically matches against key textual identifiers (e.g., common name, synonyms, HMDB accession).
  - Structure-based search is dependent on what structural fields are indexed/exposed by `HMDBParser` (e.g., SMILES/InChI).

- **Integration / cross-references**
  - HMDB entries often include cross-references to external databases (e.g., KEGG, PubChem, ChEBI).
  - A common workflow is to extract these identifiers and build mapping tables for downstream joins.

- **Spectral analysis (conceptual)**
  - HMDB contains NMR/MS references for some metabolites; this skill can be extended to link parsed entries to spectral metadata.
  - Actual spectral matching/identification is not guaranteed unless implemented in the codebase.

More from aipoch/medical-research-skills

SkillDescription
3d-molecule-ray-tracerGenerate photorealistic rendering scripts for PyMOL and UCSF ChimeraX.
abstract-summarizerTransform lengthy academic papers into concise, structured 250-word abstracts.
abstract-trimmerPrecision editing tool that reduces abstract word count through intelligent compression techniques, maintaining scientific rigor while meeting strict journal and conference requirements.
academic-abstract-refinerRefines long medical academic texts into SCI-style unstructured Chinese and English abstracts; use when you need to condense drafts/reports/summaries into bilingual abstracts and generate Summary_Report.md.
academic-cv-generatorGenerate structured academic CVs from free-form Chinese/English text and export to Word (.docx). Use this skill when you are asked to organize, generate, or optimize an academic CV (e.g., publications/projects/awards) into a consistent, formatted document with uniform-colored section headers and optional bilingual output.
academic-highlight-generatorGenerates submission-ready Elsevier/SCI Highlights from manuscript text or extracted PDF/DOCX/TXT content. Use when a user needs 3-5 concise, evidence-grounded highlight bullets for a research paper, review, meta-analysis, case report, or bioinformatics manuscript.
academic-norm-reviewDetects content similarity, verifies standardized citations and abbreviations, and flags potential academic integrity risks; use it before submission, during academic writing QA, or for compliance reviews.
academic-poster-generatorComplete workflow for generating academic research posters from PDF literature; use when you need to extract paper content from PDFs and produce a LaTeX-based poster (beamerposter/tikzposter/baposter) with mandatory figure generation and a final rendered HTML deliverable.
acronym-unpackerIntelligent medical abbreviation disambiguation tool that resolves ambiguous acronyms using clinical context, specialty-specific knowledge, and document-level semantic analysis.
active-comparator-single-soc-faers-safety-comparisonGenerates complete FAERS pharmacovigilance study designs for multi-drug or class-level safety comparison inside one predefined SOC or AE family using active comparators, disproportionality analysis, subgroup characterization, and reviewer-facing evidence control.