bio-gene-regulatory-networks-multiomics-grn

$npx mdskill add GPTomics/bioSkills/bio-gene-regulatory-networks-multiomics-grn

Integrate multiomics data to infer enhancer-driven gene regulatory networks

  • Solves cis-regulatory GRN inference from paired scRNA-seq and ATAC-seq datasets.
  • Depends on SCENIC+, pycisTopic, scanpy, pandas, matplotlib, and Cell Ranger tools.
  • Decides eRegulon assembly by linking transcription factors to enhancers via chromatin accessibility patterns.
  • Delivers results as TF-enhancer-gene triplets representing regulatory relationships.
SKILL.md
.github/skills/bio-gene-regulatory-networks-multiomics-grnView on GitHub ↗
---
name: bio-gene-regulatory-networks-multiomics-grn
description: Build enhancer-driven gene regulatory networks by integrating single-cell RNA-seq and ATAC-seq data using SCENIC+ to identify eRegulons linking transcription factors to enhancers and target genes. Use when analyzing 10x multiome or paired scRNA+scATAC data to infer cis-regulatory GRNs.
tool_type: python
primary_tool: SCENIC+
---

## Version Compatibility

Reference examples tested with: Cell Ranger 8.0+, MACS3 3.0+, matplotlib 3.8+, pandas 2.2+, scanpy 1.10+

Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Multiomics GRN Inference

**"Build an enhancer-driven gene regulatory network from my multiome data"** → Integrate scRNA-seq and scATAC-seq to identify eRegulons: transcription factor-enhancer-target gene triplets linking TF binding to chromatin accessibility and gene expression changes.
- Python: SCENIC+ pipeline with `scenicplus` for eRegulon assembly
- Python: `pycisTopic` for topic modeling of scATAC-seq regions

Build enhancer-driven gene regulatory networks from paired single-cell RNA-seq and ATAC-seq data. SCENIC+ extends SCENIC by linking TFs to their enhancers and target genes through eRegulons.

## SCENIC+ Overview

| Component | Tool | Purpose |
|-----------|------|---------|
| Topic modeling | cisTopic | Identify cis-regulatory topics from scATAC |
| Region-to-gene | SCENIC+ | Link accessible regions to target genes |
| TF-to-region | SCENIC+ | Connect TFs to their enhancer targets |
| eRegulon assembly | SCENIC+ | Combine into TF-enhancer-gene triplets |

## Input Preparation

### From 10x Multiome (CellRanger ARC)

```python
import scanpy as sc
import pycisTopic
from pycisTopic.cistopic_class import create_cistopic_object_from_fragments

# Load RNA from CellRanger ARC output
adata_rna = sc.read_10x_h5('filtered_feature_bc_matrix.h5', gex_only=True)
adata_rna.var_names_make_unique()
sc.pp.filter_cells(adata_rna, min_genes=200)
sc.pp.filter_genes(adata_rna, min_cells=3)

# Load ATAC fragments
fragments_file = 'atac_fragments.tsv.gz'
```

### Call Peaks with MACS3

```python
import subprocess

# Call peaks from fragments file for region universe
subprocess.run([
    'macs3', 'callpeak',
    '-t', 'atac_fragments.tsv.gz',
    '-f', 'BEDPE',
    '--nomodel', '--shift', '-75', '--extsize', '150',
    '-g', 'hs',
    '--keep-dup', 'all',
    '-n', 'multiome_peaks',
    '--outdir', 'macs3_output'
], check=True)
```

### Create cisTopic Object

```python
from pycisTopic.cistopic_class import create_cistopic_object_from_fragments
from pycisTopic.lda_models import run_cgs_models

# Create binary accessibility matrix from fragments
path_to_regions = 'macs3_output/multiome_peaks_peaks.narrowPeak'
cistopic_obj = create_cistopic_object_from_fragments(
    path_to_fragments=fragments_file,
    path_to_regions=path_to_regions,
    path_to_blacklist='hg38-blacklist.v2.bed',
    n_cpu=8
)

# Run LDA topic modeling
# Test multiple topic numbers; select by model metrics
models = run_cgs_models(cistopic_obj, n_topics=[10, 20, 30, 40, 50],
                        n_cpu=8, n_iter=300, random_state=42)

# Select best model (highest coherence, lowest perplexity)
from pycisTopic.lda_models import evaluate_models
model = evaluate_models(models, return_model=True)
cistopic_obj.add_LDA_model(model)
```

## SCENIC+ Workflow

**Goal:** Assemble enhancer-driven gene regulatory networks (eRegulons) linking transcription factors to their target enhancers and downstream genes from paired scRNA+scATAC data.

**Approach:** Create a SCENICPLUS object from preprocessed scRNA-seq AnnData and cisTopic ATAC object, then run the complete pipeline which performs motif enrichment, region-to-gene linking, and eRegulon assembly.

```python
import scenicplus
from scenicplus.scenicplus_class import SCENICPLUS
from scenicplus.wrappers.run_scenicplus import run_scenicplus

# Prepare SCENIC+ object
scplus_obj = SCENICPLUS(
    adata_rna,           # scRNA-seq AnnData
    cistopic_obj,        # cisTopic object with topics
    menr=None            # will be computed
)

# Run complete SCENIC+ pipeline
# This performs: motif enrichment, region-to-gene linking, eRegulon assembly
run_scenicplus(
    scplus_obj,
    variable=['GeneExpressionLevel'],
    species='hsapiens',
    assembly='hg38',
    tf_file='allTFs_hg38.txt',
    save_path='scenicplus_output/',
    biomart_host='http://www.ensembl.org',
    upstream=[1000, 150000],    # search space upstream of TSS
    downstream=[1000, 150000],  # search space downstream of TSS
    calculate_TF_eGRN_correlation=True,
    calculate_DEGs_DARs=True,
    export_to_loom_file=True,
    export_to_UCSC_file=True,
    n_cpu=8
)
```

## eRegulon Interpretation

**Goal:** Summarize the discovered eRegulons to identify the most active transcription factors and distinguish activating from repressive regulation.

**Approach:** Parse the eRegulon metadata table to count regions and target genes per TF, then separate activating (+) and repressive (-) eRegulons by their name suffix.

```python
import pandas as pd

# eRegulons link TFs -> enhancers -> target genes
eregulons = scplus_obj.uns['eRegulon_metadata']
print(f'Found {eregulons["Region_signature_name"].nunique()} eRegulons')

# Summarize eRegulons
ereg_summary = eregulons.groupby('TF').agg(
    n_regions=('Region', 'nunique'),
    n_genes=('Gene', 'nunique')
).sort_values('n_genes', ascending=False)
print(ereg_summary.head(20))

# Activating vs repressive eRegulons
# (+) suffix = activating (TF expression positively correlated with targets)
# (-) suffix = repressive (TF expression negatively correlated with targets)
activating = eregulons[eregulons['Region_signature_name'].str.contains(r'\(\+\)')]
repressive = eregulons[eregulons['Region_signature_name'].str.contains(r'\(-\)')]
print(f'Activating: {activating["TF"].nunique()}, Repressive: {repressive["TF"].nunique()}')
```

### eRegulon Activity Scoring

```python
from scenicplus.eregulon_enrichment import score_eRegulons

# Score eRegulon activity per cell (like AUCell but for eRegulons)
score_eRegulons(scplus_obj, ranking_key='eRegulon_AUC', enrichment_type='region')
score_eRegulons(scplus_obj, ranking_key='eRegulon_AUC_gene', enrichment_type='gene')

# eRegulon AUC matrix
ereg_auc = scplus_obj.uns['eRegulon_AUC']['Gene_based'].copy()
```

## Visualization

```python
import matplotlib.pyplot as plt
from scenicplus.plotting.dotplot import heatmap_dotplot

# eRegulon activity heatmap by cell type
heatmap_dotplot(
    scplus_obj,
    size_matrix=scplus_obj.uns['eRegulon_AUC']['Gene_based'],
    color_matrix=scplus_obj.uns['eRegulon_AUC']['Region_based'],
    group_variable='cell_type',
    save='eregulon_dotplot.pdf'
)

# Network visualization of top eRegulon
from scenicplus.plotting.grn_plot import plot_eRegulon

plot_eRegulon(scplus_obj, tf_name='PAX5', save='pax5_eregulon.pdf')
```

## FigR Alternative

FigR infers TF-gene regulatory interactions from paired scRNA+scATAC without topic modeling.

```r
library(FigR)
library(Seurat)
library(Signac)

# Load Seurat object with RNA and ATAC assays (from Signac)
seurat_obj <- readRDS('multiome_seurat.rds')

# Run FigR
# Step 1: compute peak-gene correlations
peak_gene_cors <- runGenePeakcorr(
    ATAC.se = seurat_obj@assays$ATAC,
    RNAmat = seurat_obj@assays$RNA@data,
    genome = 'hg38',
    nCores = 8,
    p.cut = 0.05  # significance threshold for peak-gene links
)

# Step 2: infer TF-gene regulation scores (DORC-based)
fig_results <- runFigR(
    ATAC.se = seurat_obj@assays$ATAC,
    dorcTab = peak_gene_cors,
    genome = 'hg38',
    dorcMat = getDORCScores(seurat_obj@assays$ATAC, peak_gene_cors),
    rnaMat = seurat_obj@assays$RNA@data,
    nCores = 8
)

# Top regulatory TF-gene links
top_links <- fig_results[order(abs(fig_results$Score), decreasing = TRUE), ]
head(top_links, 20)
```

## Resource Requirements

| Dataset Size | RAM | Time | Notes |
|-------------|-----|------|-------|
| 5K cells | 16 GB | 2-4 hours | Feasible on laptop |
| 20K cells | 64 GB | 8-12 hours | Standard server |
| 50K+ cells | 128 GB+ | 24+ hours | Subsample or use HPC |

## Related Skills

- scenic-regulons - RNA-only regulon inference with pySCENIC
- single-cell/scatac-analysis - scATAC-seq preprocessing with Signac/ArchR
- atac-seq/atac-peak-calling - Peak calling for region universe
- chip-seq/motif-analysis - Motif enrichment for TF identification
More from GPTomics/bioSkills