bio-systems-biology-model-curation
$
npx mdskill add GPTomics/bioSkills/bio-systems-biology-model-curationCurate and validate genome-scale metabolic models to meet SBML standards
- Improve draft models for publication or biological accuracy
- Uses memote for quality scoring and COBRApy for manual curation
- Evaluates model completeness, consistency, and compliance with standards
- Generates reports and suggests gap-filling or stoichiometric corrections
SKILL.md
.github/skills/bio-systems-biology-model-curationView on GitHub ↗
---
name: bio-systems-biology-model-curation
description: Validate, gap-fill, and curate genome-scale metabolic models using memote for quality scores and COBRApy for manual curation. Ensure models meet SBML standards and produce biologically meaningful predictions. Use when improving draft models or preparing models for publication.
tool_type: python
primary_tool: memote
---
## Version Compatibility
Reference examples tested with: COBRApy 0.29+
Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
- CLI: `<tool> --version` then `<tool> --help` to confirm flags
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
# Model Curation
**"Validate and improve the quality of my metabolic model"** → Score a genome-scale model against SBML community standards using memote, then gap-fill blocked reactions and fix stoichiometric inconsistencies using COBRApy to ensure biologically meaningful predictions.
- CLI: `memote report snapshot` for quality scoring
- Python: `cobra.flux_analysis.gapfilling.gapfill()` for gap-filling
## Memote Quality Assessment
**Goal:** Evaluate the quality and standards compliance of a genome-scale metabolic model to identify areas needing curation.
**Approach:** Run memote snapshot to score the model against SBML community standards, then use the Python API to inspect individual test failures and guide manual fixes.
```bash
# Install memote
pip install memote
# Run full quality report
memote report snapshot model.xml --filename report.html
# Quick score
memote run model.xml
# Continuous integration testing
memote run --pytest-args "--tb=short" model.xml
```
## Memote Python API
```python
import memote
import cobra
model = cobra.io.read_sbml_model('model.xml')
# Run all tests
result = memote.suite.api.run(model)
# Get score breakdown
scores = memote.suite.api.snapshot(model)
print(f"Total score: {scores['score']['total_score']:.2%}")
# Detailed test results
for test_name, test_result in scores['tests'].items():
if not test_result['passed']:
print(f"Failed: {test_name}")
```
## Gap-Filling
```python
import cobra
from cobra.flux_analysis import gapfill
model = cobra.io.read_sbml_model('model.xml')
# Load universal reaction database
universal = cobra.io.read_sbml_model('universal_model.xml')
# Find reactions to add for growth
# demand: reaction to optimize (usually biomass exchange)
# iterations: number of alternative solutions
solution = gapfill(model, universal,
demand=model.reactions.BIOMASS,
iterations=5)
# solution contains list of reaction sets to add
for i, rxn_set in enumerate(solution):
print(f'Solution {i+1}: {[r.id for r in rxn_set]}')
# Add first solution
for rxn in solution[0]:
model.add_reactions([rxn])
```
## Identify Dead-End Metabolites
```python
def find_dead_end_metabolites(model):
'''Find metabolites that cannot be produced or consumed
Dead-end metabolites indicate:
- Missing reactions in the network
- Incorrect reaction stoichiometry
- Incomplete pathways
'''
dead_ends = []
for met in model.metabolites:
producing = [r for r in met.reactions if r.get_coefficient(met) > 0]
consuming = [r for r in met.reactions if r.get_coefficient(met) < 0]
if not producing or not consuming:
dead_ends.append({
'metabolite': met.id,
'name': met.name,
'producers': len(producing),
'consumers': len(consuming)
})
return dead_ends
dead_ends = find_dead_end_metabolites(model)
print(f'Found {len(dead_ends)} dead-end metabolites')
```
## Check Mass and Charge Balance
```python
def check_reaction_balance(reaction):
'''Check if reaction is mass and charge balanced
Unbalanced reactions indicate:
- Missing metabolites
- Wrong stoichiometry
- Proton accounting issues
'''
mass_balance = {}
charge_balance = 0
for met, coef in reaction.metabolites.items():
# Check mass
if met.formula:
for element, count in met.elements.items():
mass_balance[element] = mass_balance.get(element, 0) + coef * count
# Check charge
if met.charge is not None:
charge_balance += coef * met.charge
is_balanced = all(abs(v) < 1e-6 for v in mass_balance.values())
is_charge_balanced = abs(charge_balance) < 1e-6
return {
'mass_balanced': is_balanced,
'charge_balanced': is_charge_balanced,
'mass_imbalance': {k: v for k, v in mass_balance.items() if abs(v) > 1e-6}
}
# Check all reactions
unbalanced = []
for rxn in model.reactions:
result = check_reaction_balance(rxn)
if not result['mass_balanced']:
unbalanced.append((rxn.id, result['mass_imbalance']))
```
## Fix Gene-Protein-Reaction Rules
```python
def standardize_gpr(model):
'''Standardize gene-protein-reaction rules
GPR format: (gene1 and gene2) or gene3
- 'and' = protein complex (all genes required)
- 'or' = isozymes (any gene sufficient)
'''
for rxn in model.reactions:
if rxn.gene_reaction_rule:
# Standardize formatting
rule = rxn.gene_reaction_rule
rule = rule.replace(' AND ', ' and ')
rule = rule.replace(' OR ', ' or ')
rxn.gene_reaction_rule = rule
def identify_orphan_reactions(model):
'''Find reactions without gene associations
Orphan reactions may be:
- Spontaneous reactions
- Unannotated genes
- Transport reactions (often orphan)
'''
orphans = [r for r in model.reactions if not r.genes]
# Classify orphans
exchange = [r for r in orphans if r in model.exchanges]
transport = [r for r in orphans if 'transport' in r.name.lower() or 't_' in r.id.lower()]
other = [r for r in orphans if r not in exchange and r not in transport]
return {
'exchange': len(exchange),
'transport': len(transport),
'other': len(other),
'total': len(orphans)
}
```
## Annotation Standards
```python
def add_standard_annotations(model):
'''Add standard database annotations
Required annotations for SBML compliance:
- KEGG IDs for reactions and metabolites
- ChEBI IDs for metabolites
- BiGG IDs if applicable
'''
for met in model.metabolites:
if not hasattr(met, 'annotation'):
met.annotation = {}
# Add SBO term for metabolite
met.annotation['sbo'] = 'SBO:0000247' # Simple chemical
for rxn in model.reactions:
if not hasattr(rxn, 'annotation'):
rxn.annotation = {}
# Add SBO term based on reaction type
if rxn in model.exchanges:
rxn.annotation['sbo'] = 'SBO:0000627' # Exchange
else:
rxn.annotation['sbo'] = 'SBO:0000176' # Biochemical reaction
```
## Related Skills
- systems-biology/metabolic-reconstruction - Generate draft models
- systems-biology/flux-balance-analysis - Test curated models
- pathway-analysis/kegg-pathways - Add KEGG annotations
More from GPTomics/bioSkills
- bio-admet-predictionPredicts ADMET properties using ADMETlab 3.0 API or DeepChem models. Estimates bioavailability, CYP inhibition, hERG liability, and 119 toxicity endpoints with uncertainty quantification. Filters for PAINS and other structural alerts. Use when filtering compounds for drug-likeness or prioritizing leads by predicted safety.
- bio-alignment-amplicon-clippingTrim PCR primers from aligned reads in amplicon-panel BAMs using samtools ampliconclip. Use when processing SARS-CoV-2 ARTIC, hereditary cancer panels, ctDNA hot-spot panels, or any amplicon assay where primer-derived bases would falsely confirm reference at primer footprints.
- bio-alignment-filteringFilter alignments by flags, mapping quality, and regions using samtools view and pysam. Use when extracting specific reads, removing low-quality alignments, or subsetting to target regions.
- bio-alignment-indexingCreate and use BAI/CSI indices for BAM/CRAM files using samtools and pysam. Use when enabling random access to alignment files or fetching specific genomic regions.
- bio-alignment-ioRead, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
- bio-alignment-msa-parsingParse and analyze multiple sequence alignments using Biopython. Extract sequences, identify conserved regions, analyze gaps, work with annotations, and manipulate alignment data for downstream analysis. Use when parsing or manipulating multiple sequence alignments.
- bio-alignment-msa-statisticsCalculate alignment statistics including sequence identity, conservation scores, substitution matrices, and similarity metrics. Use when comparing alignment quality, measuring sequence divergence, and analyzing evolutionary patterns.
- bio-alignment-multiplePerform multiple sequence alignment using MAFFT, MUSCLE5, ClustalOmega, or T-Coffee. Guides tool and algorithm selection based on dataset size, sequence divergence, and downstream application. Use when aligning three or more homologous sequences for phylogenetics, conservation analysis, or evolutionary studies.
- bio-alignment-pairwisePerform pairwise sequence alignment using Biopython Bio.Align.PairwiseAligner. Use when comparing two sequences, finding optimal alignments, scoring similarity, and identifying local or global matches between DNA, RNA, or protein sequences.
- bio-alignment-sortingSort alignment files by coordinate or read name using samtools and pysam. Use when preparing BAM files for indexing, variant calling, or paired-end analysis.