hmdb-database

Name: hmdb-database
Author: aipoch/medical-research-skills

$npx mdskill add aipoch/medical-research-skills/hmdb-database

Query metabolites and extract clinical chemical data from HMDB.

Retrieves metabolite details using names, IDs, or chemical structures.
Depends on the Human Metabolome Database XML dump and parser.
Selects fields based on requested chemical biological or clinical categories.
Delivers structured JSON containing formula pathways and disease links.

SKILL.md

.github/skills/hmdb-databaseView on GitHub ↗

---
name: hmdb-database
description: Access the Human Metabolome Database (HMDB) to search metabolites by name/structure/ID and extract chemical/biological/clinical fields when you need metabolomics research data or automated HMDB XML mining.
license: MIT
author: aipoch
---
> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)

## When to Use

- You need to look up a metabolite by **common name** (e.g., “Caffeine”) and retrieve its HMDB entry data.
- You have an **HMDB ID** (e.g., `HMDB0000001`) and want to extract standardized chemical/biological/clinical fields for downstream analysis.
- You want to build a **local, scriptable pipeline** to mine the HMDB XML dump instead of manually browsing the website.
- You need to **map HMDB identifiers** to external resources (e.g., KEGG, PubChem, ChEBI) for integration tasks.
- You are preparing metabolomics datasets and need **pathway/enzyme/transporter** annotations from HMDB entries.

## Key Features

- Search metabolites by:
  - Text name
  - HMDB identifier (e.g., `HMDB0000001`)
  - Structure-related query (as supported by the parser/search implementation)
- Parse the HMDB XML dataset and extract:
  - **Chemical data** (formula, molecular weight, InChI/SMILES where available)
  - **Biological data** (pathways, enzymes, transporters)
  - **Clinical data** (disease associations, biofluid concentrations)
- Optional structuring of extracted results for analysis workflows (e.g., tabular outputs).
- Supports integration workflows by exposing identifiers suitable for cross-database mapping.

## Dependencies

- Python `>=3.9`
- Standard library:
  - `xml.etree.ElementTree` (built-in)
- Optional:
  - `pandas >= 1.5`

## Example Usage

### 1) Download HMDB XML

Download the HMDB metabolite XML dataset from:
- https://hmdb.ca/downloads

Assume you saved it as:

```text
data/hmdb_metabolites.xml
```

### 2) Search and Extract Fields (Runnable Example)

```python
from scripts.hmdb_parser import HMDBParser

def main():
    # Path to the HMDB XML dump downloaded from hmdb.ca/downloads
    xml_path = "data/hmdb_metabolites.xml"

    parser = HMDBParser(xml_path)

    # Search by metabolite name (text query)
    results = parser.search("Caffeine")

    # Print basic information from the first match (structure depends on implementation)
    if not results:
        print("No results found.")
        return

    first = results[0]
    print("Top match:")
    print(first)

if __name__ == "__main__":
    main()
```

### 3) Field Reference

For a curated list of extractable fields and how they map to HMDB XML elements, see:

- `references/hmdb_data_fields.md`

## Implementation Details

- **Data acquisition**
  - Primary workflow uses the official HMDB downloadable XML dataset (recommended for bulk parsing).
  - Single-entry lookups can be done via the HMDB website, but this skill is designed around XML parsing.

- **Parsing approach**
  - The parser reads the HMDB XML and traverses metabolite entries using `xml.etree.ElementTree`.
  - Extracted fields should follow the definitions documented in `references/hmdb_data_fields.md`.

- **Search behavior**
  - Name/ID search typically matches against key textual identifiers (e.g., common name, synonyms, HMDB accession).
  - Structure-based search is dependent on what structural fields are indexed/exposed by `HMDBParser` (e.g., SMILES/InChI).

- **Integration / cross-references**
  - HMDB entries often include cross-references to external databases (e.g., KEGG, PubChem, ChEBI).
  - A common workflow is to extract these identifiers and build mapping tables for downstream joins.

- **Spectral analysis (conceptual)**
  - HMDB contains NMR/MS references for some metabolites; this skill can be extended to link parsed entries to spectral metadata.
  - Actual spectral matching/identification is not guaranteed unless implemented in the codebase.

More from aipoch/medical-research-skills

Skill	Description
3d-molecule-ray-tracer	Generate photorealistic rendering scripts for PyMOL and UCSF ChimeraX.
abstract-summarizer	Transform lengthy academic papers into concise, structured 250-word abstracts.
abstract-trimmer	Precision editing tool that reduces abstract word count through intelligent compression techniques, maintaining scientific rigor while meeting strict journal and conference requirements.
academic-abstract-refiner	Refines long medical academic texts into SCI-style unstructured Chinese and English abstracts; use when you need to condense drafts/reports/summaries into bilingual abstracts and generate Summary_Report.md.
academic-cv-generator	Generate structured academic CVs from free-form Chinese/English text and export to Word (.docx). Use this skill when you are asked to organize, generate, or optimize an academic CV (e.g., publications/projects/awards) into a consistent, formatted document with uniform-colored section headers and optional bilingual output.
academic-highlight-generator	Generates submission-ready Elsevier/SCI Highlights from manuscript text or extracted PDF/DOCX/TXT content. Use when a user needs 3-5 concise, evidence-grounded highlight bullets for a research paper, review, meta-analysis, case report, or bioinformatics manuscript.
academic-norm-review	Detects content similarity, verifies standardized citations and abbreviations, and flags potential academic integrity risks; use it before submission, during academic writing QA, or for compliance reviews.
academic-poster-generator	Complete workflow for generating academic research posters from PDF literature; use when you need to extract paper content from PDFs and produce a LaTeX-based poster (beamerposter/tikzposter/baposter) with mandatory figure generation and a final rendered HTML deliverable.
acronym-unpacker	Intelligent medical abbreviation disambiguation tool that resolves ambiguous acronyms using clinical context, specialty-specific knowledge, and document-level semantic analysis.
active-comparator-single-soc-faers-safety-comparison	Generates complete FAERS pharmacovigilance study designs for multi-drug or class-level safety comparison inside one predefined SOC or AE family using active comparators, disproportionality analysis, subgroup characterization, and reviewer-facing evidence control.