bio-pathway-enrichment-visualization

$npx mdskill add GPTomics/bioSkills/bio-pathway-enrichment-visualization

Visualize clusterProfiler enrichment results with publication-quality plots

  • Generate clear, publication-ready figures from bioinformatics enrichment data
  • Uses enrichplot R package with functions like dotplot, cnetplot, and gseaplot2
  • Selects appropriate visualization type based on analysis output and user needs
  • Returns high-resolution plots suitable for scientific manuscripts and presentations
SKILL.md
.github/skills/bio-pathway-enrichment-visualizationView on GitHub ↗
---
name: bio-pathway-enrichment-visualization
description: Visualize enrichment results using enrichplot package functions. Use when creating publication-quality figures from clusterProfiler results. Covers dotplot, barplot, cnetplot, emapplot, gseaplot2, ridgeplot, and treeplot.
tool_type: r
primary_tool: enrichplot
---

## Version Compatibility

Reference examples tested with: ggplot2 3.5+

Before using code patterns, verify installed versions match. If versions differ:
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Enrichment Visualization

**"Create publication-quality plots from my enrichment analysis"** → Generate dotplots, gene-concept networks, enrichment maps, GSEA running score plots, and ridgeplots from clusterProfiler results.
- R: `dotplot()`, `cnetplot()`, `emapplot()`, `gseaplot2()` (enrichplot)

## Scope

This skill covers **enrichplot package functions** designed for clusterProfiler results:
- `dotplot()`, `barplot()` - Summary views
- `cnetplot()`, `emapplot()`, `treeplot()` - Network/hierarchical views
- `gseaplot2()`, `ridgeplot()` - GSEA-specific
- `goplot()`, `heatplot()`, `upsetplot()` - Specialized views

**For custom ggplot2 enrichment dotplots** (manual implementation), see `data-visualization/specialized-omics-plots`.

## Setup

**Goal:** Load required packages for visualizing enrichment analysis results.

**Approach:** Import clusterProfiler, enrichplot, and ggplot2 which provide the plotting functions for enrichment objects.

```r
library(clusterProfiler)
library(enrichplot)
library(ggplot2)

# Assume ego (enrichGO result), kk (enrichKEGG result), or gse (GSEA result) exists
```

## Dot Plot

**Goal:** Summarize enrichment results showing gene ratio, count, and significance in a single figure.

**Approach:** Use enrichplot dotplot which maps gene ratio to x-axis, term to y-axis, dot size to count, and color to p-value.

Most common visualization - shows gene ratio, count, and significance.

```r
dotplot(ego, showCategory = 20)

# Customize
dotplot(ego, showCategory = 15, font.size = 10, title = 'GO Enrichment') +
    scale_color_gradient(low = 'red', high = 'blue')

# Save
pdf('go_dotplot.pdf', width = 10, height = 8)
dotplot(ego, showCategory = 20)
dev.off()
```

## Bar Plot

Shows enrichment count or gene ratio.

```r
barplot(ego, showCategory = 20)

# Customize
barplot(ego, showCategory = 15, x = 'GeneRatio', color = 'p.adjust')
```

## Gene-Concept Network (cnetplot)

**Goal:** Visualize which genes contribute to multiple enriched terms, revealing shared biology.

**Approach:** Build a bipartite network connecting enriched terms to their member genes, optionally colored by fold change.

Shows relationships between genes and enriched terms.

```r
# Basic cnetplot
cnetplot(ego)

# With fold change colors
cnetplot(ego, foldChange = gene_list)

# Circular layout
cnetplot(ego, circular = TRUE, colorEdge = TRUE)

# Customize node size
cnetplot(ego, node_label = 'gene', cex_label_gene = 0.8)
```

## Enrichment Map (emapplot)

**Goal:** Identify clusters of related enriched terms by visualizing shared gene overlap.

**Approach:** Compute pairwise term similarity, then plot as a network where edges connect terms sharing genes.

Shows term-term relationships based on shared genes.

```r
# Requires pairwise_termsim first
ego_pt <- pairwise_termsim(ego)
emapplot(ego_pt)

# Customize
emapplot(ego_pt, showCategory = 30, cex_label_category = 0.6)

# Cluster by similarity
emapplot(ego_pt, group_category = TRUE, group_legend = TRUE)
```

### pairwise_termsim() Method Selection

```r
# Default: Jaccard Coefficient (works with any gene set type)
ego_pt <- pairwise_termsim(ego)

# For GO terms: Wang semantic similarity (more biologically meaningful)
ego_pt <- pairwise_termsim(ego, method = 'Wang', semData = godata('org.Hs.eg.db', ont = 'BP'))
```

| Method | Type | When to Use |
|--------|------|-------------|
| JC (Jaccard) | Gene overlap | Default; works with KEGG, Reactome, any gene set |
| Wang | Graph-based | Best for GO; captures biological relationships independent of annotation version |
| Resnik/Lin/Jiang | IC-based | GO only; depends on annotation corpus (results change between database releases) |

## Tree Plot

Hierarchical clustering of enriched terms.

```r
ego_pt <- pairwise_termsim(ego)
treeplot(ego_pt)

# Show more categories
treeplot(ego_pt, showCategory = 30)
```

## Upset Plot

Show overlapping genes between terms.

```r
upsetplot(ego)

# Limit to specific number of terms
upsetplot(ego, n = 10)
```

## GSEA-Specific Plots

### Running Score Plot (gseaplot2)

```r
# Single gene set
gseaplot2(gse, geneSetID = 1, title = gse$Description[1])

# Multiple gene sets
gseaplot2(gse, geneSetID = 1:3)

# With subplots
gseaplot2(gse, geneSetID = 1, subplots = 1:3)

# By term ID
gseaplot2(gse, geneSetID = 'GO:0006955')
```

### Ridge Plot

Distribution of fold changes in gene sets.

```r
ridgeplot(gse)

# Top n gene sets
ridgeplot(gse, showCategory = 15)

# Order by NES
ridgeplot(gse, showCategory = 20) + theme(axis.text.y = element_text(size = 8))
```

**Reading ridge plots:**
- **Shifted right (positive values):** Gene set enriched among upregulated genes
- **Shifted left (negative values):** Gene set enriched among downregulated genes
- **Bimodal distribution:** Pathway contains both strongly up- and down-regulated genes; may indicate heterogeneous pathway with opposing components
- **Narrow peak:** Enrichment driven by a small cluster of similarly ranked genes
- **Broad distribution:** Many genes with varied rankings (more diffuse, less concentrated signal)

## GO-Specific Plot (goplot)

DAG structure of GO terms.

```r
# Only for GO enrichment results
goplot(ego)

# Specific ontology
goplot(ego_bp)  # where ego_bp is enrichGO with ont='BP'
```

## Heatplot

Gene-concept heatmap.

```r
heatplot(ego, foldChange = gene_list)

# Customize
heatplot(ego, showCategory = 15, foldChange = gene_list)
```

## Compare Multiple Analyses

**Goal:** Visualize enrichment results side by side across multiple gene lists or conditions.

**Approach:** Use dotplot on compareCluster output, optionally faceting by cluster.

```r
# Compare clusters (from compareCluster)
dotplot(ck, showCategory = 10)

# Facet by cluster
dotplot(ck) + facet_grid(~Cluster)
```

## Customize ggplot2 Elements

**Goal:** Fine-tune enrichment plots with custom titles, themes, colors, and text sizes.

**Approach:** Chain ggplot2 modifiers onto enrichplot output since all functions return ggplot2 objects.

All enrichplot functions return ggplot2 objects.

```r
p <- dotplot(ego, showCategory = 20)

# Add title
p + ggtitle('GO Biological Process Enrichment')

# Change theme
p + theme_minimal()

# Adjust text
p + theme(axis.text.y = element_text(size = 10))

# Change colors
p + scale_color_viridis_c()
```

## Save Plots

**Goal:** Export enrichment plots as publication-quality PDF or PNG files.

**Approach:** Use base R pdf/png device functions or ggplot2 ggsave to write plots to files.

```r
# PDF (vector, publication quality)
pdf('enrichment_plots.pdf', width = 10, height = 8)
dotplot(ego, showCategory = 20)
dev.off()

# PNG (raster)
png('dotplot.png', width = 800, height = 600, res = 100)
dotplot(ego, showCategory = 20)
dev.off()

# Using ggsave
p <- dotplot(ego)
ggsave('dotplot.pdf', p, width = 10, height = 8)
```

## Visualization Summary

| Function | Best For | Input Type |
|----------|----------|------------|
| dotplot | Overview of enrichment | ORA, GSEA |
| barplot | Simple counts/ratios | ORA |
| cnetplot | Gene-term relationships | ORA |
| emapplot | Term clustering | ORA |
| treeplot | Hierarchical grouping | ORA |
| upsetplot | Term overlap | ORA |
| gseaplot2 | Running enrichment score | GSEA |
| ridgeplot | Fold change distribution | GSEA |
| goplot | GO DAG structure | GO only |
| heatplot | Gene-concept matrix | ORA |

## Choosing the Right Visualization

| Goal | Plot | Key Tip |
|------|------|---------|
| First overview of top enriched terms | dotplot | Best starting point; shows 3 dimensions (ratio, count, p-value) |
| Which genes drive multiple enriched terms | cnetplot | Limit to 5-10 terms; use `circular = TRUE` for crowded networks |
| Identify functional modules among terms | emapplot | Run `pairwise_termsim()` first; if everything connects to everything, results are redundant |
| GSEA: detailed single-pathway view | gseaplot2 | Check where genes cluster in the ranked list |
| GSEA: overview of all enriched sets | ridgeplot | Read direction (left/right shift) and shape (narrow vs broad) |
| Compare enrichment across conditions | dotplot on compareCluster | Use `facet_grid(~Cluster)` for side-by-side panels |

## Common Visualization Mistakes

- **Too many terms**: plots with > 30 terms are unreadable. Use `showCategory = 15-20`.
- **Not simplifying GO first**: showing 15 redundant GO terms (cell cycle, cell cycle process, mitotic cell cycle...) wastes space and misleads. Run `simplify()` before plotting.
- **Missing gene set size**: always show both the overlap count and the total pathway size. A 3/5 overlap (60%) is very different from 30/500 (6%).
- **Bar plots for GSEA**: bar plots show count or enrichment. For GSEA, use NES on the x-axis, not p-value. Use dotplot or ridgeplot instead.
- **Skipping pairwise_termsim()**: emapplot and treeplot will fail or produce meaningless results without it.

## Related Skills

- go-enrichment - Generate GO enrichment results
- kegg-pathways - Generate KEGG enrichment results
- gsea - Generate GSEA results
More from GPTomics/bioSkills