bio-gene-regulatory-networks-differential-networks
$
npx mdskill add GPTomics/bioSkills/bio-gene-regulatory-networks-differential-networksDetect rewired gene correlations between biological conditions.
- Identifies gained, lost, and reversed gene-gene relationships.
- Integrates DiffCorr, scipy, statsmodels, and numpy libraries.
- Executes Fisher z-transform to test correlation significance.
- Outputs differential network maps with statistical confidence.
SKILL.md
.github/skills/bio-gene-regulatory-networks-differential-networksView on GitHub ↗
---
name: bio-gene-regulatory-networks-differential-networks
description: Compare gene regulatory and co-expression networks between biological conditions to identify rewired regulatory relationships using DiffCorr. Detects gained, lost, and reversed gene-gene correlations between conditions. Use when comparing co-expression networks between disease vs control, treatment conditions, or developmental stages.
tool_type: r
primary_tool: DiffCorr
---
## Version Compatibility
Reference examples tested with: matplotlib 3.8+, numpy 1.26+, pandas 2.2+, scipy 1.12+, statsmodels 0.14+
Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
# Differential Networks
**"Compare gene co-expression networks between my disease and control groups"** → Test whether gene-gene correlations differ significantly between two conditions using Fisher's z-transform, identifying gained, lost, and reversed regulatory relationships.
- R: `DiffCorr::comp.2.cc.fdr()` for differential correlation analysis
- Python: custom Fisher z-test with `scipy.stats` and `statsmodels` for FDR correction
Compare co-expression and regulatory networks between biological conditions to identify rewired gene-gene relationships.
## DiffCorr Workflow
DiffCorr uses Fisher's z-transform to test whether the correlation between two genes differs significantly between two conditions.
### Required Libraries
```r
library(DiffCorr)
library(igraph)
```
### Input Preparation
```r
# Separate expression matrices by condition (genes as columns, samples as rows)
expr_all <- read.csv('normalized_counts.csv', row.names = 1)
sample_info <- read.csv('sample_info.csv', row.names = 1)
expr_condition1 <- t(expr_all[, sample_info$condition == 'control'])
expr_condition2 <- t(expr_all[, sample_info$condition == 'disease'])
# Filter to variable genes (top 2000-5000 most variable)
gene_vars <- apply(expr_all, 1, var)
top_genes <- names(sort(gene_vars, decreasing = TRUE))[1:3000]
expr_condition1 <- expr_condition1[, top_genes]
expr_condition2 <- expr_condition2[, top_genes]
```
### Run DiffCorr Analysis
```r
# Compute correlation matrices
cor1 <- cor(expr_condition1, method = 'pearson')
cor2 <- cor(expr_condition2, method = 'pearson')
n1 <- nrow(expr_condition1)
n2 <- nrow(expr_condition2)
# Fisher z-transform test for differential correlation
# Compares correlation coefficients between two groups
result <- comp.2.cc.fdr(
output.file = 'diffcorr_results.txt',
data1 = expr_condition1,
data2 = expr_condition2,
threshold = 0.05 # FDR threshold
)
```
### Parse and Classify Results
```r
diffcorr <- read.delim('diffcorr_results.txt')
# Classify differential correlations
classify_edge <- function(cor1, cor2, threshold = 0.3) {
if (abs(cor1) < threshold & abs(cor2) >= threshold) return('gained')
if (abs(cor1) >= threshold & abs(cor2) < threshold) return('lost')
if (cor1 > threshold & cor2 < -threshold) return('reversed')
if (cor1 < -threshold & cor2 > threshold) return('reversed')
return('unchanged')
}
diffcorr$edge_type <- mapply(classify_edge, diffcorr$cor1, diffcorr$cor2)
table(diffcorr$edge_type)
# Focus on significant rewired edges
significant <- diffcorr[diffcorr$p.adj < 0.05, ]
rewired <- significant[significant$edge_type != 'unchanged', ]
print(paste('Significant rewired edges:', nrow(rewired)))
```
### Identify Rewired Hub Genes
```r
# Genes involved in most differential correlations
gene_rewiring <- c(rewired$gene1, rewired$gene2)
rewiring_counts <- sort(table(gene_rewiring), decreasing = TRUE)
top_rewired_genes <- head(rewiring_counts, 20)
print(top_rewired_genes)
# These genes have the most condition-specific regulatory changes
```
## DGCA (Alternative)
DGCA (Differential Gene Correlation Analysis) was archived from CRAN in May 2024. It is still available via GitHub.
```r
# Install from GitHub (archived from CRAN 2024-05)
devtools::install_github('andymckenzie/DGCA')
library(DGCA)
# Run differential correlation analysis
dgca_results <- ddcorAll(
inputMat = expr_all,
design = sample_info$condition,
compare = c('control', 'disease'),
adjust = 'BH', # multiple testing correction
nPerm = 0, # 0 for analytical p-values (faster)
corrType = 'pearson'
)
# Filter significant results
sig_dgca <- dgca_results[dgca_results$pValDiff_adj < 0.05, ]
# Classify by correlation change category
# DGCA provides: +/+, +/-, -/+, -/-, +/0, 0/+, -/0, 0/-, 0/0
table(sig_dgca$Classes)
```
## Python NetworkX Approach
**Goal:** Identify gene pairs whose co-expression relationship changes significantly between two conditions using a Python-native workflow.
**Approach:** Compute per-condition Pearson correlation matrices, test each gene pair for differential correlation using Fisher's z-transform, apply BH FDR correction, then classify edges as gained, lost, or reversed based on correlation magnitude thresholds.
```python
import pandas as pd
import numpy as np
from scipy import stats
import networkx as nx
def build_correlation_network(expr_matrix, threshold=0.6):
'''Build network from correlation matrix, keeping edges above threshold.'''
cor_matrix = expr_matrix.corr(method='pearson')
genes = cor_matrix.columns.tolist()
G = nx.Graph()
G.add_nodes_from(genes)
for i in range(len(genes)):
for j in range(i + 1, len(genes)):
r = cor_matrix.iloc[i, j]
if abs(r) >= threshold:
G.add_edge(genes[i], genes[j], weight=r)
return G
def fisher_z_test(r1, n1, r2, n2):
'''Fisher z-transform test for difference between two correlations.'''
z1 = np.arctanh(np.clip(r1, -0.9999, 0.9999))
z2 = np.arctanh(np.clip(r2, -0.9999, 0.9999))
se = np.sqrt(1 / (n1 - 3) + 1 / (n2 - 3))
z_stat = (z1 - z2) / se
p_value = 2 * stats.norm.sf(abs(z_stat))
return z_stat, p_value
def differential_network_analysis(expr_cond1, expr_cond2, fdr_threshold=0.05):
'''Compare correlation networks between two conditions.'''
genes = expr_cond1.columns.tolist()
n1, n2 = len(expr_cond1), len(expr_cond2)
cor1 = expr_cond1.corr()
cor2 = expr_cond2.corr()
results = []
for i in range(len(genes)):
for j in range(i + 1, len(genes)):
r1, r2 = cor1.iloc[i, j], cor2.iloc[i, j]
z_stat, p_val = fisher_z_test(r1, n1, r2, n2)
results.append({
'gene1': genes[i], 'gene2': genes[j],
'cor_cond1': r1, 'cor_cond2': r2,
'z_stat': z_stat, 'p_value': p_val
})
df = pd.DataFrame(results)
# FDR correction
from statsmodels.stats.multitest import multipletests
_, df['p_adj'], _, _ = multipletests(df['p_value'], method='fdr_bh')
# Classify edges
def classify(row, threshold=0.3):
r1, r2 = row['cor_cond1'], row['cor_cond2']
if abs(r1) < threshold and abs(r2) >= threshold:
return 'gained'
if abs(r1) >= threshold and abs(r2) < threshold:
return 'lost'
if r1 > threshold and r2 < -threshold:
return 'reversed'
if r1 < -threshold and r2 > threshold:
return 'reversed'
return 'unchanged'
df['edge_type'] = df.apply(classify, axis=1)
return df
# Usage
expr_all = pd.read_csv('normalized_counts.csv', index_col=0).T
sample_info = pd.read_csv('sample_info.csv', index_col=0)
expr_ctrl = expr_all.loc[sample_info[sample_info['condition'] == 'control'].index]
expr_disease = expr_all.loc[sample_info[sample_info['condition'] == 'disease'].index]
# Filter to top variable genes
gene_vars = expr_all.var()
top_genes = gene_vars.nlargest(3000).index.tolist()
expr_ctrl, expr_disease = expr_ctrl[top_genes], expr_disease[top_genes]
diff_results = differential_network_analysis(expr_ctrl, expr_disease)
significant = diff_results[diff_results['p_adj'] < 0.05]
print(significant['edge_type'].value_counts())
```
### Visualize Differential Network
**Goal:** Render the rewired gene-gene connections as a color-coded network graph distinguishing gained, lost, and reversed edges.
**Approach:** Build a NetworkX graph from significant rewired edges, assign edge colors by change type, and draw with a spring layout.
```python
import matplotlib.pyplot as plt
rewired = significant[significant['edge_type'] != 'unchanged']
G_diff = nx.Graph()
color_map = {'gained': '#2ca02c', 'lost': '#d62728', 'reversed': '#9467bd'}
for _, row in rewired.iterrows():
G_diff.add_edge(row['gene1'], row['gene2'],
color=color_map[row['edge_type']], weight=abs(row['z_stat']))
edge_colors = [G_diff[u][v]['color'] for u, v in G_diff.edges()]
fig, ax = plt.subplots(figsize=(12, 12))
pos = nx.spring_layout(G_diff, k=2, seed=42)
nx.draw_networkx(G_diff, pos, edge_color=edge_colors, node_size=50,
font_size=6, width=0.5, alpha=0.7, ax=ax)
# Legend
import matplotlib.patches as mpatches
legend_handles = [mpatches.Patch(color=c, label=l) for l, c in color_map.items()]
ax.legend(handles=legend_handles, loc='upper left')
ax.set_title('Differential co-expression network')
plt.savefig('differential_network.pdf', bbox_inches='tight')
```
## Statistical Considerations
| Consideration | Guideline | Rationale |
|---------------|-----------|-----------|
| Minimum samples per group | 20 recommended, 15 floor | Correlation estimates unstable below 15 |
| Multiple testing | BH FDR correction | Thousands of gene pairs tested |
| Correlation method | Pearson (default) | Spearman for non-linear; Pearson is standard |
| Gene filtering | Top 2000-5000 variable | Reduces test burden, focuses on informative genes |
| Effect size filter | abs(delta_r) > 0.3 | Avoid reporting trivially different correlations |
## Related Skills
- coexpression-networks - Build WGCNA networks for individual conditions
- scenic-regulons - TF regulon inference from scRNA-seq
- differential-expression/deseq2-basics - DE analysis to complement network rewiring
- temporal-genomics/temporal-grn - Time-delayed regulatory inference from temporal data
More from GPTomics/bioSkills
- bio-admet-predictionPredicts ADMET properties using ADMETlab 3.0 API or DeepChem models. Estimates bioavailability, CYP inhibition, hERG liability, and 119 toxicity endpoints with uncertainty quantification. Filters for PAINS and other structural alerts. Use when filtering compounds for drug-likeness or prioritizing leads by predicted safety.
- bio-alignment-amplicon-clippingTrim PCR primers from aligned reads in amplicon-panel BAMs using samtools ampliconclip. Use when processing SARS-CoV-2 ARTIC, hereditary cancer panels, ctDNA hot-spot panels, or any amplicon assay where primer-derived bases would falsely confirm reference at primer footprints.
- bio-alignment-filteringFilter alignments by flags, mapping quality, and regions using samtools view and pysam. Use when extracting specific reads, removing low-quality alignments, or subsetting to target regions.
- bio-alignment-indexingCreate and use BAI/CSI indices for BAM/CRAM files using samtools and pysam. Use when enabling random access to alignment files or fetching specific genomic regions.
- bio-alignment-ioRead, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
- bio-alignment-msa-parsingParse and analyze multiple sequence alignments using Biopython. Extract sequences, identify conserved regions, analyze gaps, work with annotations, and manipulate alignment data for downstream analysis. Use when parsing or manipulating multiple sequence alignments.
- bio-alignment-msa-statisticsCalculate alignment statistics including sequence identity, conservation scores, substitution matrices, and similarity metrics. Use when comparing alignment quality, measuring sequence divergence, and analyzing evolutionary patterns.
- bio-alignment-multiplePerform multiple sequence alignment using MAFFT, MUSCLE5, ClustalOmega, or T-Coffee. Guides tool and algorithm selection based on dataset size, sequence divergence, and downstream application. Use when aligning three or more homologous sequences for phylogenetics, conservation analysis, or evolutionary studies.
- bio-alignment-pairwisePerform pairwise sequence alignment using Biopython Bio.Align.PairwiseAligner. Use when comparing two sequences, finding optimal alignments, scoring similarity, and identifying local or global matches between DNA, RNA, or protein sequences.
- bio-alignment-sortingSort alignment files by coordinate or read name using samtools and pysam. Use when preparing BAM files for indexing, variant calling, or paired-end analysis.