bio-gene-regulatory-networks-differential-networks

$npx mdskill add GPTomics/bioSkills/bio-gene-regulatory-networks-differential-networks

Detect rewired gene correlations between biological conditions.

  • Identifies gained, lost, and reversed gene-gene relationships.
  • Integrates DiffCorr, scipy, statsmodels, and numpy libraries.
  • Executes Fisher z-transform to test correlation significance.
  • Outputs differential network maps with statistical confidence.
SKILL.md
.github/skills/bio-gene-regulatory-networks-differential-networksView on GitHub ↗
---
name: bio-gene-regulatory-networks-differential-networks
description: Compare gene regulatory and co-expression networks between biological conditions to identify rewired regulatory relationships using DiffCorr. Detects gained, lost, and reversed gene-gene correlations between conditions. Use when comparing co-expression networks between disease vs control, treatment conditions, or developmental stages.
tool_type: r
primary_tool: DiffCorr
---

## Version Compatibility

Reference examples tested with: matplotlib 3.8+, numpy 1.26+, pandas 2.2+, scipy 1.12+, statsmodels 0.14+

Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Differential Networks

**"Compare gene co-expression networks between my disease and control groups"** → Test whether gene-gene correlations differ significantly between two conditions using Fisher's z-transform, identifying gained, lost, and reversed regulatory relationships.
- R: `DiffCorr::comp.2.cc.fdr()` for differential correlation analysis
- Python: custom Fisher z-test with `scipy.stats` and `statsmodels` for FDR correction

Compare co-expression and regulatory networks between biological conditions to identify rewired gene-gene relationships.

## DiffCorr Workflow

DiffCorr uses Fisher's z-transform to test whether the correlation between two genes differs significantly between two conditions.

### Required Libraries

```r
library(DiffCorr)
library(igraph)
```

### Input Preparation

```r
# Separate expression matrices by condition (genes as columns, samples as rows)
expr_all <- read.csv('normalized_counts.csv', row.names = 1)
sample_info <- read.csv('sample_info.csv', row.names = 1)

expr_condition1 <- t(expr_all[, sample_info$condition == 'control'])
expr_condition2 <- t(expr_all[, sample_info$condition == 'disease'])

# Filter to variable genes (top 2000-5000 most variable)
gene_vars <- apply(expr_all, 1, var)
top_genes <- names(sort(gene_vars, decreasing = TRUE))[1:3000]
expr_condition1 <- expr_condition1[, top_genes]
expr_condition2 <- expr_condition2[, top_genes]
```

### Run DiffCorr Analysis

```r
# Compute correlation matrices
cor1 <- cor(expr_condition1, method = 'pearson')
cor2 <- cor(expr_condition2, method = 'pearson')

n1 <- nrow(expr_condition1)
n2 <- nrow(expr_condition2)

# Fisher z-transform test for differential correlation
# Compares correlation coefficients between two groups
result <- comp.2.cc.fdr(
    output.file = 'diffcorr_results.txt',
    data1 = expr_condition1,
    data2 = expr_condition2,
    threshold = 0.05  # FDR threshold
)
```

### Parse and Classify Results

```r
diffcorr <- read.delim('diffcorr_results.txt')

# Classify differential correlations
classify_edge <- function(cor1, cor2, threshold = 0.3) {
    if (abs(cor1) < threshold & abs(cor2) >= threshold) return('gained')
    if (abs(cor1) >= threshold & abs(cor2) < threshold) return('lost')
    if (cor1 > threshold & cor2 < -threshold) return('reversed')
    if (cor1 < -threshold & cor2 > threshold) return('reversed')
    return('unchanged')
}

diffcorr$edge_type <- mapply(classify_edge, diffcorr$cor1, diffcorr$cor2)
table(diffcorr$edge_type)

# Focus on significant rewired edges
significant <- diffcorr[diffcorr$p.adj < 0.05, ]
rewired <- significant[significant$edge_type != 'unchanged', ]
print(paste('Significant rewired edges:', nrow(rewired)))
```

### Identify Rewired Hub Genes

```r
# Genes involved in most differential correlations
gene_rewiring <- c(rewired$gene1, rewired$gene2)
rewiring_counts <- sort(table(gene_rewiring), decreasing = TRUE)
top_rewired_genes <- head(rewiring_counts, 20)
print(top_rewired_genes)

# These genes have the most condition-specific regulatory changes
```

## DGCA (Alternative)

DGCA (Differential Gene Correlation Analysis) was archived from CRAN in May 2024. It is still available via GitHub.

```r
# Install from GitHub (archived from CRAN 2024-05)
devtools::install_github('andymckenzie/DGCA')
library(DGCA)

# Run differential correlation analysis
dgca_results <- ddcorAll(
    inputMat = expr_all,
    design = sample_info$condition,
    compare = c('control', 'disease'),
    adjust = 'BH',         # multiple testing correction
    nPerm = 0,             # 0 for analytical p-values (faster)
    corrType = 'pearson'
)

# Filter significant results
sig_dgca <- dgca_results[dgca_results$pValDiff_adj < 0.05, ]

# Classify by correlation change category
# DGCA provides: +/+, +/-, -/+, -/-, +/0, 0/+, -/0, 0/-, 0/0
table(sig_dgca$Classes)
```

## Python NetworkX Approach

**Goal:** Identify gene pairs whose co-expression relationship changes significantly between two conditions using a Python-native workflow.

**Approach:** Compute per-condition Pearson correlation matrices, test each gene pair for differential correlation using Fisher's z-transform, apply BH FDR correction, then classify edges as gained, lost, or reversed based on correlation magnitude thresholds.

```python
import pandas as pd
import numpy as np
from scipy import stats
import networkx as nx

def build_correlation_network(expr_matrix, threshold=0.6):
    '''Build network from correlation matrix, keeping edges above threshold.'''
    cor_matrix = expr_matrix.corr(method='pearson')
    genes = cor_matrix.columns.tolist()
    G = nx.Graph()
    G.add_nodes_from(genes)

    for i in range(len(genes)):
        for j in range(i + 1, len(genes)):
            r = cor_matrix.iloc[i, j]
            if abs(r) >= threshold:
                G.add_edge(genes[i], genes[j], weight=r)
    return G


def fisher_z_test(r1, n1, r2, n2):
    '''Fisher z-transform test for difference between two correlations.'''
    z1 = np.arctanh(np.clip(r1, -0.9999, 0.9999))
    z2 = np.arctanh(np.clip(r2, -0.9999, 0.9999))
    se = np.sqrt(1 / (n1 - 3) + 1 / (n2 - 3))
    z_stat = (z1 - z2) / se
    p_value = 2 * stats.norm.sf(abs(z_stat))
    return z_stat, p_value


def differential_network_analysis(expr_cond1, expr_cond2, fdr_threshold=0.05):
    '''Compare correlation networks between two conditions.'''
    genes = expr_cond1.columns.tolist()
    n1, n2 = len(expr_cond1), len(expr_cond2)
    cor1 = expr_cond1.corr()
    cor2 = expr_cond2.corr()

    results = []
    for i in range(len(genes)):
        for j in range(i + 1, len(genes)):
            r1, r2 = cor1.iloc[i, j], cor2.iloc[i, j]
            z_stat, p_val = fisher_z_test(r1, n1, r2, n2)
            results.append({
                'gene1': genes[i], 'gene2': genes[j],
                'cor_cond1': r1, 'cor_cond2': r2,
                'z_stat': z_stat, 'p_value': p_val
            })

    df = pd.DataFrame(results)

    # FDR correction
    from statsmodels.stats.multitest import multipletests
    _, df['p_adj'], _, _ = multipletests(df['p_value'], method='fdr_bh')

    # Classify edges
    def classify(row, threshold=0.3):
        r1, r2 = row['cor_cond1'], row['cor_cond2']
        if abs(r1) < threshold and abs(r2) >= threshold:
            return 'gained'
        if abs(r1) >= threshold and abs(r2) < threshold:
            return 'lost'
        if r1 > threshold and r2 < -threshold:
            return 'reversed'
        if r1 < -threshold and r2 > threshold:
            return 'reversed'
        return 'unchanged'

    df['edge_type'] = df.apply(classify, axis=1)
    return df


# Usage
expr_all = pd.read_csv('normalized_counts.csv', index_col=0).T
sample_info = pd.read_csv('sample_info.csv', index_col=0)

expr_ctrl = expr_all.loc[sample_info[sample_info['condition'] == 'control'].index]
expr_disease = expr_all.loc[sample_info[sample_info['condition'] == 'disease'].index]

# Filter to top variable genes
gene_vars = expr_all.var()
top_genes = gene_vars.nlargest(3000).index.tolist()
expr_ctrl, expr_disease = expr_ctrl[top_genes], expr_disease[top_genes]

diff_results = differential_network_analysis(expr_ctrl, expr_disease)
significant = diff_results[diff_results['p_adj'] < 0.05]
print(significant['edge_type'].value_counts())
```

### Visualize Differential Network

**Goal:** Render the rewired gene-gene connections as a color-coded network graph distinguishing gained, lost, and reversed edges.

**Approach:** Build a NetworkX graph from significant rewired edges, assign edge colors by change type, and draw with a spring layout.

```python
import matplotlib.pyplot as plt

rewired = significant[significant['edge_type'] != 'unchanged']

G_diff = nx.Graph()
color_map = {'gained': '#2ca02c', 'lost': '#d62728', 'reversed': '#9467bd'}

for _, row in rewired.iterrows():
    G_diff.add_edge(row['gene1'], row['gene2'],
                    color=color_map[row['edge_type']], weight=abs(row['z_stat']))

edge_colors = [G_diff[u][v]['color'] for u, v in G_diff.edges()]

fig, ax = plt.subplots(figsize=(12, 12))
pos = nx.spring_layout(G_diff, k=2, seed=42)
nx.draw_networkx(G_diff, pos, edge_color=edge_colors, node_size=50,
                 font_size=6, width=0.5, alpha=0.7, ax=ax)

# Legend
import matplotlib.patches as mpatches
legend_handles = [mpatches.Patch(color=c, label=l) for l, c in color_map.items()]
ax.legend(handles=legend_handles, loc='upper left')
ax.set_title('Differential co-expression network')
plt.savefig('differential_network.pdf', bbox_inches='tight')
```

## Statistical Considerations

| Consideration | Guideline | Rationale |
|---------------|-----------|-----------|
| Minimum samples per group | 20 recommended, 15 floor | Correlation estimates unstable below 15 |
| Multiple testing | BH FDR correction | Thousands of gene pairs tested |
| Correlation method | Pearson (default) | Spearman for non-linear; Pearson is standard |
| Gene filtering | Top 2000-5000 variable | Reduces test burden, focuses on informative genes |
| Effect size filter | abs(delta_r) > 0.3 | Avoid reporting trivially different correlations |

## Related Skills

- coexpression-networks - Build WGCNA networks for individual conditions
- scenic-regulons - TF regulon inference from scRNA-seq
- differential-expression/deseq2-basics - DE analysis to complement network rewiring
- temporal-genomics/temporal-grn - Time-delayed regulatory inference from temporal data
More from GPTomics/bioSkills