bio-pathway-reactome
$
npx mdskill add GPTomics/bioSkills/bio-pathway-reactomeAnalyzes gene lists against Reactome's peer-reviewed pathway database for enrichment
- Identifies enriched biological pathways from gene expression or mutation data
- Uses ReactomePA and clusterProfiler in R for over-representation and GSEA analysis
- Matches genes to curated pathways and ranks by statistical significance
- Generates visualizations and hierarchical pathway summaries for interpretation
SKILL.md
.github/skills/bio-pathway-reactomeView on GitHub ↗
---
name: bio-pathway-reactome
description: Reactome pathway enrichment using ReactomePA package. Use when analyzing gene lists against Reactome's curated peer-reviewed pathway database. Performs over-representation analysis and GSEA with visualization and pathway hierarchy exploration.
tool_type: r
primary_tool: ReactomePA
---
## Version Compatibility
Reference examples tested with: R stats (base), ReactomePA 1.46+, clusterProfiler 4.10+
Before using code patterns, verify installed versions match. If versions differ:
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
# Reactome Pathway Enrichment
## When to Use Reactome
| Scenario | Reactome? | Alternative |
|----------|-----------|-------------|
| Signaling pathway detail (reaction-level) | Yes -- best choice | KEGG (pathway-level only) |
| Metabolic pathway focus | Supplement | KEGG has stronger metabolic coverage |
| Reproducibility / open license required | Yes (CC0) | WikiPathways (CC0) |
| Non-model organism (bacteria, plants) | No (7 species only) | KEGG (8,000+ species) |
| Non-human model organism (mouse, rat, fly) | Caution | Annotations are computationally inferred via orthology from human; may contain errors |
Reactome pathways are curated by PhD-level biologists and externally peer-reviewed, making them the highest-quality curated pathway database. Human is the primary species; all others are computationally inferred.
## Core Pattern - Over-Representation Analysis
**Goal:** Identify Reactome pathways over-represented in a gene list from differential expression or other analyses.
**Approach:** Test for enrichment using the hypergeometric test via ReactomePA enrichPathway against curated peer-reviewed pathways.
**"Run pathway enrichment against Reactome"** → Test whether genes in curated Reactome pathways are over-represented among significant genes.
```r
library(ReactomePA)
library(org.Hs.eg.db)
pathway_result <- enrichPathway(
gene = entrez_ids, # Character vector of Entrez IDs
organism = 'human', # human, rat, mouse, celegans, yeast, zebrafish, fly
pvalueCutoff = 0.05,
pAdjustMethod = 'BH',
readable = TRUE # Convert to gene symbols
)
head(as.data.frame(pathway_result))
```
## Prepare Gene List from DE Results
**Goal:** Extract significant Entrez gene IDs from differential expression results for Reactome enrichment.
**Approach:** Filter by significance and fold change, then convert symbols to Entrez IDs using bitr.
```r
library(clusterProfiler)
de_results <- read.csv('de_results.csv')
sig_genes <- de_results[de_results$padj < 0.05 & abs(de_results$log2FoldChange) > 1, 'gene_symbol']
gene_ids <- bitr(sig_genes, fromType = 'SYMBOL', toType = 'ENTREZID', OrgDb = org.Hs.eg.db)
entrez_ids <- gene_ids$ENTREZID
```
## GSEA on Reactome Pathways
**Goal:** Detect coordinated expression changes in Reactome pathways using all genes ranked by a statistic.
**Approach:** Create a sorted named vector from DE results and run gsePathway for rank-based enrichment.
```r
# Create ranked gene list (named vector sorted by statistic)
gene_list <- de_results$log2FoldChange
names(gene_list) <- de_results$entrez_id
gene_list <- sort(gene_list, decreasing = TRUE)
gsea_result <- gsePathway(
geneList = gene_list,
organism = 'human',
pvalueCutoff = 0.05,
pAdjustMethod = 'BH',
verbose = FALSE
)
head(as.data.frame(gsea_result))
```
## With Background Universe
**Goal:** Restrict enrichment testing to only genes that were actually measured in the experiment.
**Approach:** Pass all tested gene IDs as the universe parameter to enrichPathway.
```r
all_genes <- de_results$entrez_id # All tested genes
pathway_result <- enrichPathway(
gene = entrez_ids,
universe = all_genes, # Background gene set
organism = 'human',
pvalueCutoff = 0.05,
readable = TRUE
)
```
## Visualization
**Goal:** Create publication-quality plots of Reactome enrichment results.
**Approach:** Use enrichplot functions (dotplot, barplot, emapplot, cnetplot, gseaplot2) on enrichment result objects.
```r
library(enrichplot)
# Dot plot
dotplot(pathway_result, showCategory = 15)
# Bar plot
barplot(pathway_result, showCategory = 15)
# Enrichment map (requires pairwise_termsim first)
pathway_result <- pairwise_termsim(pathway_result)
emapplot(pathway_result)
# Gene-concept network
cnetplot(pathway_result, categorySize = 'pvalue')
# GSEA plot
gseaplot2(gsea_result, geneSetID = 1:3)
```
## View Pathway in Browser
```r
# Open pathway in Reactome browser
viewPathway('R-HSA-109582', organism = 'human') # Uses pathway ID
# Get pathway ID from results
top_pathway_id <- pathway_result@result$ID[1]
viewPathway(top_pathway_id, organism = 'human')
```
## Export Results
```r
results_df <- as.data.frame(pathway_result)
write.csv(results_df, 'reactome_enrichment.csv', row.names = FALSE)
# Key columns: ID, Description, GeneRatio, BgRatio, pvalue, p.adjust, geneID, Count
```
## Different Organisms
```r
# Mouse
pathway_mouse <- enrichPathway(gene = mouse_entrez, organism = 'mouse', readable = TRUE)
# Rat
pathway_rat <- enrichPathway(gene = rat_entrez, organism = 'rat', readable = TRUE)
# Zebrafish
pathway_zfish <- enrichPathway(gene = zfish_entrez, organism = 'zebrafish', readable = TRUE)
# Supported: human, rat, mouse, celegans, yeast, zebrafish, fly
```
## Compare Clusters
**Goal:** Compare Reactome pathway enrichment across multiple gene lists (e.g., upregulated vs downregulated).
**Approach:** Use compareCluster with enrichPathway to run enrichment per group and visualize side by side.
```r
# Compare pathways across multiple gene lists
gene_clusters <- list(
upregulated = up_genes,
downregulated = down_genes
)
compare_result <- compareCluster(
geneClusters = gene_clusters,
fun = 'enrichPathway',
organism = 'human',
pvalueCutoff = 0.05
)
dotplot(compare_result)
```
## Key Parameters
| Parameter | Default | Description |
|-----------|---------|-------------|
| gene | required | Vector of Entrez IDs |
| organism | human | Species name |
| pvalueCutoff | 0.05 | P-value threshold |
| pAdjustMethod | BH | Adjustment method |
| universe | NULL | Background genes |
| minGSSize | 10 | Min genes per pathway |
| maxGSSize | 500 | Max genes per pathway |
| readable | FALSE | Convert to symbols |
## Supported Organisms
| Organism | Name | OrgDb |
|----------|------|-------|
| Human | human | org.Hs.eg.db |
| Mouse | mouse | org.Mm.eg.db |
| Rat | rat | org.Rn.eg.db |
| Zebrafish | zebrafish | org.Dr.eg.db |
| Fly | fly | org.Dm.eg.db |
| C. elegans | celegans | org.Ce.eg.db |
| Yeast | yeast | org.Sc.sgd.db |
## Interpretation Notes
- Reactome is very granular -- some pathways contain only 2-3 genes. Use `minGSSize = 10` to filter these out.
- The deep hierarchy means parent pathways will often appear alongside child pathways. Look for the most specific (deepest) enriched pathway.
- Always specify a background universe (all tested genes) to avoid inflated significance.
- Examine fold enrichment (GeneRatio / BgRatio), not just p-values.
- For non-human species, note that annotations are orthology-inferred and may not capture species-specific pathway biology.
## Related Skills
- go-enrichment - Gene Ontology enrichment
- kegg-pathways - KEGG pathway enrichment
- wikipathways - WikiPathways enrichment
- gsea - Gene Set Enrichment Analysis
- enrichment-visualization - Visualization functions
More from GPTomics/bioSkills
- bio-admet-predictionPredicts ADMET properties using ADMETlab 3.0 API or DeepChem models. Estimates bioavailability, CYP inhibition, hERG liability, and 119 toxicity endpoints with uncertainty quantification. Filters for PAINS and other structural alerts. Use when filtering compounds for drug-likeness or prioritizing leads by predicted safety.
- bio-alignment-amplicon-clippingTrim PCR primers from aligned reads in amplicon-panel BAMs using samtools ampliconclip. Use when processing SARS-CoV-2 ARTIC, hereditary cancer panels, ctDNA hot-spot panels, or any amplicon assay where primer-derived bases would falsely confirm reference at primer footprints.
- bio-alignment-filteringFilter alignments by flags, mapping quality, and regions using samtools view and pysam. Use when extracting specific reads, removing low-quality alignments, or subsetting to target regions.
- bio-alignment-indexingCreate and use BAI/CSI indices for BAM/CRAM files using samtools and pysam. Use when enabling random access to alignment files or fetching specific genomic regions.
- bio-alignment-ioRead, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
- bio-alignment-msa-parsingParse and analyze multiple sequence alignments using Biopython. Extract sequences, identify conserved regions, analyze gaps, work with annotations, and manipulate alignment data for downstream analysis. Use when parsing or manipulating multiple sequence alignments.
- bio-alignment-msa-statisticsCalculate alignment statistics including sequence identity, conservation scores, substitution matrices, and similarity metrics. Use when comparing alignment quality, measuring sequence divergence, and analyzing evolutionary patterns.
- bio-alignment-multiplePerform multiple sequence alignment using MAFFT, MUSCLE5, ClustalOmega, or T-Coffee. Guides tool and algorithm selection based on dataset size, sequence divergence, and downstream application. Use when aligning three or more homologous sequences for phylogenetics, conservation analysis, or evolutionary studies.
- bio-alignment-pairwisePerform pairwise sequence alignment using Biopython Bio.Align.PairwiseAligner. Use when comparing two sequences, finding optimal alignments, scoring similarity, and identifying local or global matches between DNA, RNA, or protein sequences.
- bio-alignment-sortingSort alignment files by coordinate or read name using samtools and pysam. Use when preparing BAM files for indexing, variant calling, or paired-end analysis.