bio-pathway-enrichment-visualization
$
npx mdskill add GPTomics/bioSkills/bio-pathway-enrichment-visualizationVisualize clusterProfiler enrichment results with publication-quality plots
- Generate clear, publication-ready figures from bioinformatics enrichment data
- Uses enrichplot R package with functions like dotplot, cnetplot, and gseaplot2
- Selects appropriate visualization type based on analysis output and user needs
- Returns high-resolution plots suitable for scientific manuscripts and presentations
SKILL.md
.github/skills/bio-pathway-enrichment-visualizationView on GitHub ↗
---
name: bio-pathway-enrichment-visualization
description: Visualize enrichment results using enrichplot package functions. Use when creating publication-quality figures from clusterProfiler results. Covers dotplot, barplot, cnetplot, emapplot, gseaplot2, ridgeplot, and treeplot.
tool_type: r
primary_tool: enrichplot
---
## Version Compatibility
Reference examples tested with: ggplot2 3.5+
Before using code patterns, verify installed versions match. If versions differ:
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
# Enrichment Visualization
**"Create publication-quality plots from my enrichment analysis"** → Generate dotplots, gene-concept networks, enrichment maps, GSEA running score plots, and ridgeplots from clusterProfiler results.
- R: `dotplot()`, `cnetplot()`, `emapplot()`, `gseaplot2()` (enrichplot)
## Scope
This skill covers **enrichplot package functions** designed for clusterProfiler results:
- `dotplot()`, `barplot()` - Summary views
- `cnetplot()`, `emapplot()`, `treeplot()` - Network/hierarchical views
- `gseaplot2()`, `ridgeplot()` - GSEA-specific
- `goplot()`, `heatplot()`, `upsetplot()` - Specialized views
**For custom ggplot2 enrichment dotplots** (manual implementation), see `data-visualization/specialized-omics-plots`.
## Setup
**Goal:** Load required packages for visualizing enrichment analysis results.
**Approach:** Import clusterProfiler, enrichplot, and ggplot2 which provide the plotting functions for enrichment objects.
```r
library(clusterProfiler)
library(enrichplot)
library(ggplot2)
# Assume ego (enrichGO result), kk (enrichKEGG result), or gse (GSEA result) exists
```
## Dot Plot
**Goal:** Summarize enrichment results showing gene ratio, count, and significance in a single figure.
**Approach:** Use enrichplot dotplot which maps gene ratio to x-axis, term to y-axis, dot size to count, and color to p-value.
Most common visualization - shows gene ratio, count, and significance.
```r
dotplot(ego, showCategory = 20)
# Customize
dotplot(ego, showCategory = 15, font.size = 10, title = 'GO Enrichment') +
scale_color_gradient(low = 'red', high = 'blue')
# Save
pdf('go_dotplot.pdf', width = 10, height = 8)
dotplot(ego, showCategory = 20)
dev.off()
```
## Bar Plot
Shows enrichment count or gene ratio.
```r
barplot(ego, showCategory = 20)
# Customize
barplot(ego, showCategory = 15, x = 'GeneRatio', color = 'p.adjust')
```
## Gene-Concept Network (cnetplot)
**Goal:** Visualize which genes contribute to multiple enriched terms, revealing shared biology.
**Approach:** Build a bipartite network connecting enriched terms to their member genes, optionally colored by fold change.
Shows relationships between genes and enriched terms.
```r
# Basic cnetplot
cnetplot(ego)
# With fold change colors
cnetplot(ego, foldChange = gene_list)
# Circular layout
cnetplot(ego, circular = TRUE, colorEdge = TRUE)
# Customize node size
cnetplot(ego, node_label = 'gene', cex_label_gene = 0.8)
```
## Enrichment Map (emapplot)
**Goal:** Identify clusters of related enriched terms by visualizing shared gene overlap.
**Approach:** Compute pairwise term similarity, then plot as a network where edges connect terms sharing genes.
Shows term-term relationships based on shared genes.
```r
# Requires pairwise_termsim first
ego_pt <- pairwise_termsim(ego)
emapplot(ego_pt)
# Customize
emapplot(ego_pt, showCategory = 30, cex_label_category = 0.6)
# Cluster by similarity
emapplot(ego_pt, group_category = TRUE, group_legend = TRUE)
```
### pairwise_termsim() Method Selection
```r
# Default: Jaccard Coefficient (works with any gene set type)
ego_pt <- pairwise_termsim(ego)
# For GO terms: Wang semantic similarity (more biologically meaningful)
ego_pt <- pairwise_termsim(ego, method = 'Wang', semData = godata('org.Hs.eg.db', ont = 'BP'))
```
| Method | Type | When to Use |
|--------|------|-------------|
| JC (Jaccard) | Gene overlap | Default; works with KEGG, Reactome, any gene set |
| Wang | Graph-based | Best for GO; captures biological relationships independent of annotation version |
| Resnik/Lin/Jiang | IC-based | GO only; depends on annotation corpus (results change between database releases) |
## Tree Plot
Hierarchical clustering of enriched terms.
```r
ego_pt <- pairwise_termsim(ego)
treeplot(ego_pt)
# Show more categories
treeplot(ego_pt, showCategory = 30)
```
## Upset Plot
Show overlapping genes between terms.
```r
upsetplot(ego)
# Limit to specific number of terms
upsetplot(ego, n = 10)
```
## GSEA-Specific Plots
### Running Score Plot (gseaplot2)
```r
# Single gene set
gseaplot2(gse, geneSetID = 1, title = gse$Description[1])
# Multiple gene sets
gseaplot2(gse, geneSetID = 1:3)
# With subplots
gseaplot2(gse, geneSetID = 1, subplots = 1:3)
# By term ID
gseaplot2(gse, geneSetID = 'GO:0006955')
```
### Ridge Plot
Distribution of fold changes in gene sets.
```r
ridgeplot(gse)
# Top n gene sets
ridgeplot(gse, showCategory = 15)
# Order by NES
ridgeplot(gse, showCategory = 20) + theme(axis.text.y = element_text(size = 8))
```
**Reading ridge plots:**
- **Shifted right (positive values):** Gene set enriched among upregulated genes
- **Shifted left (negative values):** Gene set enriched among downregulated genes
- **Bimodal distribution:** Pathway contains both strongly up- and down-regulated genes; may indicate heterogeneous pathway with opposing components
- **Narrow peak:** Enrichment driven by a small cluster of similarly ranked genes
- **Broad distribution:** Many genes with varied rankings (more diffuse, less concentrated signal)
## GO-Specific Plot (goplot)
DAG structure of GO terms.
```r
# Only for GO enrichment results
goplot(ego)
# Specific ontology
goplot(ego_bp) # where ego_bp is enrichGO with ont='BP'
```
## Heatplot
Gene-concept heatmap.
```r
heatplot(ego, foldChange = gene_list)
# Customize
heatplot(ego, showCategory = 15, foldChange = gene_list)
```
## Compare Multiple Analyses
**Goal:** Visualize enrichment results side by side across multiple gene lists or conditions.
**Approach:** Use dotplot on compareCluster output, optionally faceting by cluster.
```r
# Compare clusters (from compareCluster)
dotplot(ck, showCategory = 10)
# Facet by cluster
dotplot(ck) + facet_grid(~Cluster)
```
## Customize ggplot2 Elements
**Goal:** Fine-tune enrichment plots with custom titles, themes, colors, and text sizes.
**Approach:** Chain ggplot2 modifiers onto enrichplot output since all functions return ggplot2 objects.
All enrichplot functions return ggplot2 objects.
```r
p <- dotplot(ego, showCategory = 20)
# Add title
p + ggtitle('GO Biological Process Enrichment')
# Change theme
p + theme_minimal()
# Adjust text
p + theme(axis.text.y = element_text(size = 10))
# Change colors
p + scale_color_viridis_c()
```
## Save Plots
**Goal:** Export enrichment plots as publication-quality PDF or PNG files.
**Approach:** Use base R pdf/png device functions or ggplot2 ggsave to write plots to files.
```r
# PDF (vector, publication quality)
pdf('enrichment_plots.pdf', width = 10, height = 8)
dotplot(ego, showCategory = 20)
dev.off()
# PNG (raster)
png('dotplot.png', width = 800, height = 600, res = 100)
dotplot(ego, showCategory = 20)
dev.off()
# Using ggsave
p <- dotplot(ego)
ggsave('dotplot.pdf', p, width = 10, height = 8)
```
## Visualization Summary
| Function | Best For | Input Type |
|----------|----------|------------|
| dotplot | Overview of enrichment | ORA, GSEA |
| barplot | Simple counts/ratios | ORA |
| cnetplot | Gene-term relationships | ORA |
| emapplot | Term clustering | ORA |
| treeplot | Hierarchical grouping | ORA |
| upsetplot | Term overlap | ORA |
| gseaplot2 | Running enrichment score | GSEA |
| ridgeplot | Fold change distribution | GSEA |
| goplot | GO DAG structure | GO only |
| heatplot | Gene-concept matrix | ORA |
## Choosing the Right Visualization
| Goal | Plot | Key Tip |
|------|------|---------|
| First overview of top enriched terms | dotplot | Best starting point; shows 3 dimensions (ratio, count, p-value) |
| Which genes drive multiple enriched terms | cnetplot | Limit to 5-10 terms; use `circular = TRUE` for crowded networks |
| Identify functional modules among terms | emapplot | Run `pairwise_termsim()` first; if everything connects to everything, results are redundant |
| GSEA: detailed single-pathway view | gseaplot2 | Check where genes cluster in the ranked list |
| GSEA: overview of all enriched sets | ridgeplot | Read direction (left/right shift) and shape (narrow vs broad) |
| Compare enrichment across conditions | dotplot on compareCluster | Use `facet_grid(~Cluster)` for side-by-side panels |
## Common Visualization Mistakes
- **Too many terms**: plots with > 30 terms are unreadable. Use `showCategory = 15-20`.
- **Not simplifying GO first**: showing 15 redundant GO terms (cell cycle, cell cycle process, mitotic cell cycle...) wastes space and misleads. Run `simplify()` before plotting.
- **Missing gene set size**: always show both the overlap count and the total pathway size. A 3/5 overlap (60%) is very different from 30/500 (6%).
- **Bar plots for GSEA**: bar plots show count or enrichment. For GSEA, use NES on the x-axis, not p-value. Use dotplot or ridgeplot instead.
- **Skipping pairwise_termsim()**: emapplot and treeplot will fail or produce meaningless results without it.
## Related Skills
- go-enrichment - Generate GO enrichment results
- kegg-pathways - Generate KEGG enrichment results
- gsea - Generate GSEA results
More from GPTomics/bioSkills
- bio-admet-predictionPredicts ADMET properties using ADMETlab 3.0 API or DeepChem models. Estimates bioavailability, CYP inhibition, hERG liability, and 119 toxicity endpoints with uncertainty quantification. Filters for PAINS and other structural alerts. Use when filtering compounds for drug-likeness or prioritizing leads by predicted safety.
- bio-alignment-amplicon-clippingTrim PCR primers from aligned reads in amplicon-panel BAMs using samtools ampliconclip. Use when processing SARS-CoV-2 ARTIC, hereditary cancer panels, ctDNA hot-spot panels, or any amplicon assay where primer-derived bases would falsely confirm reference at primer footprints.
- bio-alignment-filteringFilter alignments by flags, mapping quality, and regions using samtools view and pysam. Use when extracting specific reads, removing low-quality alignments, or subsetting to target regions.
- bio-alignment-indexingCreate and use BAI/CSI indices for BAM/CRAM files using samtools and pysam. Use when enabling random access to alignment files or fetching specific genomic regions.
- bio-alignment-ioRead, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
- bio-alignment-msa-parsingParse and analyze multiple sequence alignments using Biopython. Extract sequences, identify conserved regions, analyze gaps, work with annotations, and manipulate alignment data for downstream analysis. Use when parsing or manipulating multiple sequence alignments.
- bio-alignment-msa-statisticsCalculate alignment statistics including sequence identity, conservation scores, substitution matrices, and similarity metrics. Use when comparing alignment quality, measuring sequence divergence, and analyzing evolutionary patterns.
- bio-alignment-multiplePerform multiple sequence alignment using MAFFT, MUSCLE5, ClustalOmega, or T-Coffee. Guides tool and algorithm selection based on dataset size, sequence divergence, and downstream application. Use when aligning three or more homologous sequences for phylogenetics, conservation analysis, or evolutionary studies.
- bio-alignment-pairwisePerform pairwise sequence alignment using Biopython Bio.Align.PairwiseAligner. Use when comparing two sequences, finding optimal alignments, scoring similarity, and identifying local or global matches between DNA, RNA, or protein sequences.
- bio-alignment-sortingSort alignment files by coordinate or read name using samtools and pysam. Use when preparing BAM files for indexing, variant calling, or paired-end analysis.