bio-systems-biology-context-specific-models
$
npx mdskill add GPTomics/bioSkills/bio-systems-biology-context-specific-modelsGenerate tissue-specific metabolic models using transcriptomics constraints.
- Creates context-specific models by integrating gene expression data with generic genome-scale models.
- Depends on COBRApy for model manipulation and GIMME, iMAT, or INIT algorithms.
- Selects active reactions by penalizing flux through lowly-expressed metabolic pathways.
- Delivers optimized metabolic networks reflecting the active metabolism of specific cell types.
SKILL.md
.github/skills/bio-systems-biology-context-specific-modelsView on GitHub ↗
---
name: bio-systems-biology-context-specific-models
description: Build tissue and condition-specific metabolic models using GIMME, iMAT, and INIT algorithms with expression data constraints. Create models that reflect cell-type specific metabolism. Use when building tissue-specific metabolic models or integrating transcriptomics with FBA.
tool_type: python
primary_tool: cobrapy
---
## Version Compatibility
Reference examples tested with: COBRApy 0.29+, numpy 1.26+, pandas 2.2+
Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
# Context-Specific Models
**"Build a tissue-specific metabolic model from my expression data"** → Constrain a generic genome-scale model using transcriptomics data to produce a context-specific model reflecting the active metabolism of a particular tissue or condition, using GIMME, iMAT, or INIT algorithms.
- Python: custom implementations with `cobra` model manipulation (COBRApy)
## GIMME Algorithm
**Goal:** Build a tissue-specific metabolic model by integrating transcriptomics data with a generic genome-scale model, retaining only metabolically active reactions.
**Approach:** Map gene expression values to reactions, penalize flux through lowly-expressed reactions while maintaining minimum biomass production, and remove inactive reactions to produce a context-specific model.
```python
import cobra
import numpy as np
def gimme(model, expression_data, threshold=0.25, required_growth=0.1):
'''Gene Inactivity Moderated by Metabolism and Expression (GIMME)
Creates context-specific model by:
1. Penalizing flux through lowly-expressed reactions
2. Requiring minimum biomass production
Args:
expression_data: dict mapping gene_id -> expression value
threshold: Expression percentile below which genes are inactive
0.25 = bottom 25% considered inactive
required_growth: Minimum growth rate to maintain
Returns:
Context-specific model with inactive reactions constrained
'''
# Calculate expression threshold
values = list(expression_data.values())
cutoff = np.percentile(values, threshold * 100)
# Identify lowly-expressed genes
low_expressed = {g for g, v in expression_data.items() if v < cutoff}
# Create context model
context_model = model.copy()
# Set minimum growth constraint
context_model.reactions.get_by_id('Biomass_Ecoli_core').lower_bound = required_growth
# Minimize flux through reactions with low-expressed genes
for rxn in context_model.reactions:
genes = {g.id for g in rxn.genes}
if genes and genes.issubset(low_expressed):
# This reaction is likely inactive - constrain it
rxn.upper_bound = min(rxn.upper_bound, 1.0)
rxn.lower_bound = max(rxn.lower_bound, -1.0)
return context_model
```
## iMAT Algorithm
```python
def imat(model, expression_data, high_threshold=0.75, low_threshold=0.25):
'''Integrative Metabolic Analysis Tool (iMAT)
Maximizes agreement between flux activity and expression:
- Highly expressed reactions should carry flux
- Lowly expressed reactions should have zero flux
More sophisticated than GIMME - uses MILP optimization.
'''
from cobra import Reaction
# Classify reactions by expression
high_expr_rxns = []
low_expr_rxns = []
for rxn in model.reactions:
if rxn.genes:
# Aggregate gene expression (use max for OR, min for AND)
gene_expr = [expression_data.get(g.id, 0.5) for g in rxn.genes]
rxn_expr = max(gene_expr) # Simplified OR logic
if rxn_expr > np.percentile(list(expression_data.values()), high_threshold * 100):
high_expr_rxns.append(rxn.id)
elif rxn_expr < np.percentile(list(expression_data.values()), low_threshold * 100):
low_expr_rxns.append(rxn.id)
# Create MILP to maximize consistent reactions
# This is a simplified version - full iMAT uses binary variables
context_model = model.copy()
# Force flux through highly expressed reactions
for rxn_id in high_expr_rxns:
rxn = context_model.reactions.get_by_id(rxn_id)
rxn.lower_bound = max(rxn.lower_bound, 0.01)
# Constrain lowly expressed reactions
for rxn_id in low_expr_rxns:
rxn = context_model.reactions.get_by_id(rxn_id)
rxn.upper_bound = min(rxn.upper_bound, 0.1)
rxn.lower_bound = max(rxn.lower_bound, -0.1)
return context_model, high_expr_rxns, low_expr_rxns
```
## Expression Data Integration
```python
def load_expression_data(filepath, gene_col='gene_id', expr_col='TPM'):
'''Load and normalize expression data
Accepts:
- RNA-seq counts (TPM, FPKM)
- Microarray intensities
- Proteomics abundances
Returns dict mapping gene_id -> normalized expression
'''
import pandas as pd
df = pd.read_csv(filepath)
# Log-transform if needed (high dynamic range)
expr = df[expr_col].values
if expr.max() / expr.mean() > 100:
expr = np.log2(expr + 1)
# Normalize to 0-1 range
expr_norm = (expr - expr.min()) / (expr.max() - expr.min())
return dict(zip(df[gene_col], expr_norm))
def aggregate_gene_expression(model, expression_data, method='max'):
'''Map gene expression to reactions
Methods:
- 'max': Use maximum gene expression (OR logic)
- 'min': Use minimum gene expression (AND logic)
- 'mean': Average across genes
For GPR: (A and B) or C
- min(A, B) for the complex
- max(complex, C) for the alternatives
'''
rxn_expression = {}
for rxn in model.reactions:
if not rxn.genes:
rxn_expression[rxn.id] = 0.5 # Default for non-enzymatic
continue
gene_expr = [expression_data.get(g.id, 0.5) for g in rxn.genes]
if method == 'max':
rxn_expression[rxn.id] = max(gene_expr)
elif method == 'min':
rxn_expression[rxn.id] = min(gene_expr)
else:
rxn_expression[rxn.id] = np.mean(gene_expr)
return rxn_expression
```
## Tissue-Specific Human Models
```python
def create_tissue_model(generic_model, gtex_expression, tissue='liver'):
'''Create tissue-specific model from GTEx expression data
GTEx provides median TPM for 54 human tissues.
Download from: https://gtexportal.org/home/datasets
'''
import pandas as pd
# Load GTEx median expression
gtex = pd.read_csv(gtex_expression, sep='\t')
# Extract tissue column
tissue_col = [c for c in gtex.columns if tissue.lower() in c.lower()][0]
expression = dict(zip(gtex['gene_id'], gtex[tissue_col]))
# Apply GIMME
tissue_model = gimme(generic_model, expression, threshold=0.25)
return tissue_model
```
## Validate Context Model
```python
def validate_context_model(original, context, expression_data):
'''Compare original and context-specific models
Checks:
1. Growth capability maintained
2. Inactive reactions reduced
3. Active reactions maintained
'''
# Growth comparison
orig_growth = original.optimize().objective_value
context_growth = context.optimize().objective_value
# Count constrained reactions
constrained = 0
for rxn in context.reactions:
orig_rxn = original.reactions.get_by_id(rxn.id)
if rxn.upper_bound < orig_rxn.upper_bound:
constrained += 1
return {
'original_growth': orig_growth,
'context_growth': context_growth,
'growth_ratio': context_growth / orig_growth,
'constrained_reactions': constrained,
'total_reactions': len(context.reactions)
}
```
## Related Skills
- systems-biology/flux-balance-analysis - Run FBA on context models
- differential-expression/de-results - Generate expression data
- single-cell/clustering - Cell-type specific expression
More from GPTomics/bioSkills
- bio-admet-predictionPredicts ADMET properties using ADMETlab 3.0 API or DeepChem models. Estimates bioavailability, CYP inhibition, hERG liability, and 119 toxicity endpoints with uncertainty quantification. Filters for PAINS and other structural alerts. Use when filtering compounds for drug-likeness or prioritizing leads by predicted safety.
- bio-alignment-amplicon-clippingTrim PCR primers from aligned reads in amplicon-panel BAMs using samtools ampliconclip. Use when processing SARS-CoV-2 ARTIC, hereditary cancer panels, ctDNA hot-spot panels, or any amplicon assay where primer-derived bases would falsely confirm reference at primer footprints.
- bio-alignment-filteringFilter alignments by flags, mapping quality, and regions using samtools view and pysam. Use when extracting specific reads, removing low-quality alignments, or subsetting to target regions.
- bio-alignment-indexingCreate and use BAI/CSI indices for BAM/CRAM files using samtools and pysam. Use when enabling random access to alignment files or fetching specific genomic regions.
- bio-alignment-ioRead, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
- bio-alignment-msa-parsingParse and analyze multiple sequence alignments using Biopython. Extract sequences, identify conserved regions, analyze gaps, work with annotations, and manipulate alignment data for downstream analysis. Use when parsing or manipulating multiple sequence alignments.
- bio-alignment-msa-statisticsCalculate alignment statistics including sequence identity, conservation scores, substitution matrices, and similarity metrics. Use when comparing alignment quality, measuring sequence divergence, and analyzing evolutionary patterns.
- bio-alignment-multiplePerform multiple sequence alignment using MAFFT, MUSCLE5, ClustalOmega, or T-Coffee. Guides tool and algorithm selection based on dataset size, sequence divergence, and downstream application. Use when aligning three or more homologous sequences for phylogenetics, conservation analysis, or evolutionary studies.
- bio-alignment-pairwisePerform pairwise sequence alignment using Biopython Bio.Align.PairwiseAligner. Use when comparing two sequences, finding optimal alignments, scoring similarity, and identifying local or global matches between DNA, RNA, or protein sequences.
- bio-alignment-sortingSort alignment files by coordinate or read name using samtools and pysam. Use when preparing BAM files for indexing, variant calling, or paired-end analysis.