bio-spatial-transcriptomics-spatial-statistics

$npx mdskill add GPTomics/bioSkills/bio-spatial-transcriptomics-spatial-statistics

Computes spatial statistics for transcriptomics data using Squidpy

  • Identifies spatially variable genes through autocorrelation and co-occurrence analysis
  • Relies on Squidpy, Scanpy, and spatial neighbor graphs for statistical computation
  • Calculates Moran's I, Geary's C, and neighborhood enrichment to detect spatial patterns
  • Returns statistical results for downstream analysis or visualization
SKILL.md
.github/skills/bio-spatial-transcriptomics-spatial-statisticsView on GitHub ↗
---
name: bio-spatial-transcriptomics-spatial-statistics
description: Compute spatial statistics for spatial transcriptomics data using Squidpy. Calculate Moran's I, Geary's C, spatial autocorrelation, co-occurrence analysis, and neighborhood enrichment. Use when computing spatial autocorrelation or co-occurrence statistics.
tool_type: python
primary_tool: squidpy
---

## Version Compatibility

Reference examples tested with: numpy 1.26+, pandas 2.2+, scanpy 1.10+, scipy 1.12+, squidpy 1.3+

Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Spatial Statistics

Compute spatial statistics and identify spatially variable features.

## Required Imports

```python
import squidpy as sq
import scanpy as sc
import pandas as pd
import numpy as np
```

## Compute Spatial Autocorrelation (Moran's I)

**Goal:** Identify genes whose expression is spatially autocorrelated across tissue.

**Approach:** Build a spatial neighbor graph, then compute Moran's I statistic per gene to measure clustering of similar values.

**"Find spatially variable genes"** -> Compute Moran's I autocorrelation on the spatial neighbor graph to rank genes by spatial patterning.

```python
# Requires spatial neighbors
sq.gr.spatial_neighbors(adata, coord_type='generic', n_neighs=6)

# Compute Moran's I for all genes (can be slow)
sq.gr.spatial_autocorr(adata, mode='moran')

# Or for specific genes
sq.gr.spatial_autocorr(adata, mode='moran', genes=['GENE1', 'GENE2', 'GENE3'])

# Results stored in adata.uns['moranI']
moran_results = adata.uns['moranI']
print(moran_results.head(20))
```

## Interpret Moran's I

```python
# Moran's I ranges from -1 to 1
# I > 0: positive spatial autocorrelation (similar values cluster)
# I = 0: random spatial distribution
# I < 0: negative spatial autocorrelation (dissimilar values cluster)

# Get significantly spatially variable genes
svg = moran_results[moran_results['pval_norm'] < 0.05].sort_values('I', ascending=False)
print(f'Found {len(svg)} spatially variable genes (p < 0.05)')
print('\nTop 10 spatially variable genes:')
print(svg.head(10)[['I', 'pval_norm']])
```

## Compute Geary's C

```python
# Alternative spatial autocorrelation measure
sq.gr.spatial_autocorr(adata, mode='geary')

# Results in adata.uns['gearyC']
geary_results = adata.uns['gearyC']
# C < 1: positive spatial autocorrelation
# C = 1: random
# C > 1: negative spatial autocorrelation
```

## Co-occurrence Analysis

**Goal:** Determine whether cell types co-localize or segregate in tissue space.

**Approach:** Compute pairwise co-occurrence probabilities at multiple distance intervals using Squidpy.

```python
# Analyze co-localization of cell types/clusters
# First, ensure you have cluster labels
sc.pp.neighbors(adata)
sc.tl.leiden(adata)

# Compute co-occurrence
sq.gr.co_occurrence(adata, cluster_key='leiden')

# Results in adata.uns['leiden_co_occurrence']
# Visualize co-occurrence
sq.pl.co_occurrence(adata, cluster_key='leiden')
```

## Interpret Co-occurrence

```python
co_occ = adata.uns['leiden_co_occurrence']
occ_matrix = co_occ['occ']  # Occurrence matrix
interval = co_occ['interval']  # Distance intervals

# occ_matrix[i, j, k] = occurrence of cluster j around cluster i at distance interval k
print(f'Occurrence matrix shape: {occ_matrix.shape}')
print(f'Distance intervals: {interval}')
```

## Neighborhood Enrichment

**Goal:** Test whether cell type clusters are enriched or depleted in each other's spatial neighborhoods.

**Approach:** Run permutation-based neighborhood enrichment test, yielding z-scores for each cluster pair.

```python
# Test if clusters are enriched in each other's neighborhoods
sq.gr.nhood_enrichment(adata, cluster_key='leiden')

# Results in adata.uns['leiden_nhood_enrichment']
# zscore > 0: clusters co-localize more than expected
# zscore < 0: clusters avoid each other

# Visualize
sq.pl.nhood_enrichment(adata, cluster_key='leiden')
```

## Extract Enrichment Z-scores

```python
enrichment = adata.uns['leiden_nhood_enrichment']
zscore = enrichment['zscore']
clusters = adata.obs['leiden'].cat.categories

# Convert to DataFrame
zscore_df = pd.DataFrame(zscore, index=clusters, columns=clusters)
print('Neighborhood enrichment z-scores:')
print(zscore_df)
```

## Ripley's Statistics

```python
# Ripley's K/L function for point pattern analysis (single-cell resolution data)
sq.gr.ripley(adata, cluster_key='leiden', mode='L')

# Results in adata.uns['leiden_ripley']
sq.pl.ripley(adata, cluster_key='leiden')
```

## Centrality Scores

```python
# Compute centrality of each cell type
sq.gr.centrality_scores(adata, cluster_key='leiden')

# Results in adata.uns['leiden_centrality_scores']
centrality = adata.uns['leiden_centrality_scores']
print(centrality)
```

## Interaction Matrix

```python
# Build interaction matrix between clusters
sq.gr.interaction_matrix(adata, cluster_key='leiden')

# Results in adata.uns['leiden_interactions']
interactions = adata.uns['leiden_interactions']
print(interactions)
```

## Custom Spatial Statistic

```python
from scipy.stats import pearsonr

def spatial_correlation(adata, gene1, gene2):
    '''Compute spatial correlation between two genes'''
    expr1 = adata[:, gene1].X.toarray().flatten()
    expr2 = adata[:, gene2].X.toarray().flatten()
    r, p = pearsonr(expr1, expr2)
    return r, p

r, p = spatial_correlation(adata, 'GENE1', 'GENE2')
print(f'Spatial correlation: r={r:.3f}, p={p:.2e}')
```

## Local Moran's I (LISA)

**Goal:** Identify local hotspots and coldspots of gene expression in tissue space.

**Approach:** Compute Local Indicators of Spatial Association (LISA) using PySAL, which assigns each spot to a cluster-outlier quadrant (HH, HL, LH, LL).

```python
from esda.moran import Moran_Local
from libpysal.weights import KNN

# Build weights matrix
coords = adata.obsm['spatial']
w = KNN.from_array(coords, k=6)
w.transform = 'r'

# Compute local Moran's I for a gene
gene_expr = adata[:, 'GENE1'].X.toarray().flatten()
lisa = Moran_Local(gene_expr, w)

# Add to adata
adata.obs['GENE1_lisa'] = lisa.Is
adata.obs['GENE1_lisa_q'] = lisa.q  # Quadrant (HH, HL, LH, LL)
```

## Batch Spatial Statistics

**Goal:** Efficiently compute spatial autocorrelation for a large set of genes.

**Approach:** Subset to highly variable genes before running Moran's I to reduce computation time.

```python
# Compute Moran's I for top variable genes only
hvg = adata.var_names[adata.var['highly_variable']][:500]
sq.gr.spatial_autocorr(adata, mode='moran', genes=hvg)

results = adata.uns['moranI']
significant = results[results['pval_norm'] < 0.01]
print(f'{len(significant)} genes with significant spatial autocorrelation')
```

## Related Skills

- spatial-neighbors - Build spatial graphs (prerequisite)
- spatial-domains - Identify spatial domains
- spatial-visualization - Visualize spatial statistics
More from GPTomics/bioSkills