bio-ecological-genomics-community-ecology
$
npx mdskill add GPTomics/bioSkills/bio-ecological-genomics-community-ecologyTest environmental gradients structure species communities with constrained ordination.
- Links species composition to explanatory variables using CCA and RDA methods.
- Depends on R vegan package and indicspecies for statistical analysis.
- Executes variance partitioning to isolate variation among predictor groups.
- Delivers indicator taxa results via multipatt analysis for habitat identification.
SKILL.md
.github/skills/bio-ecological-genomics-community-ecologyView on GitHub ↗
---
name: bio-ecological-genomics-community-ecology
description: Analyzes community composition using constrained ordination (CCA, RDA, db-RDA), variance partitioning (varpart), indicator species analysis (indicspecies multipatt), and distance-based environmental gradient methods with vegan. Links species composition to environmental explanatory variables. Use when testing how environmental gradients structure species communities, identifying habitat indicator taxa, or partitioning explained variation among predictors. Not for basic unconstrained ordination and PERMANOVA (see microbiome/diversity-analysis).
tool_type: r
primary_tool: vegan
---
## Version Compatibility
Reference examples tested with: ggplot2 3.5+, vegan 2.6+
Before using code patterns, verify installed versions match. If versions differ:
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
# Community Ecology
**"Test how environmental variables structure my species communities"** → Link species composition to environmental gradients using constrained ordination (CCA, RDA, db-RDA), partition explained variation among predictor groups with varpart, and identify habitat indicator taxa with multipatt.
- R: `vegan::cca()`, `vegan::rda()`, `vegan::dbrda()` for constrained ordination
- R: `indicspecies::multipatt()` for indicator species analysis
Analyzes species-environment relationships using constrained ordination, variance partitioning, and indicator species analysis.
## Canonical Correspondence Analysis (CCA)
CCA models unimodal species responses along environmental gradients. Appropriate when species have optima along gradients (bell-shaped response curves):
```r
library(vegan)
# species_matrix: sites x species abundance matrix
# env_data: sites x environmental variables data frame
cca_result <- cca(species_matrix ~ temperature + precipitation + soil_pH + elevation,
data = env_data)
# Global permutation test (999 permutations by default)
anova(cca_result, permutations = 999)
# Per-axis significance
anova(cca_result, by = 'axis', permutations = 999)
# Per-term (marginal) significance
anova(cca_result, by = 'margin', permutations = 999)
# Proportion of variance explained
# Total inertia = constrained + unconstrained
summary(cca_result)$tot.chi
summary(cca_result)$constr.chi
prop_explained <- summary(cca_result)$constr.chi / summary(cca_result)$tot.chi
# Triplot: sites, species, and environmental vectors
plot(cca_result, display = c('sites', 'species', 'bp'), scaling = 2)
```
## Redundancy Analysis (RDA)
RDA models linear species responses. Preferred when species respond linearly to gradients (short gradients, <= 3 SD):
```r
# Hellinger transformation recommended for abundance data with RDA
# Reduces the weight of rare species and double-zero problem
species_hell <- decostand(species_matrix, method = 'hellinger')
rda_result <- rda(species_hell ~ temperature + precipitation + soil_pH + elevation,
data = env_data)
# Adjusted R-squared (corrected for number of predictors)
RsquareAdj(rda_result)
# Permutation test
anova(rda_result, permutations = 999)
anova(rda_result, by = 'margin', permutations = 999)
```
## Forward Selection with ordiR2step
Selects the most parsimonious set of environmental predictors:
```r
# Null model (intercept only)
rda_null <- rda(species_hell ~ 1, data = env_data)
# Full model (all predictors)
rda_full <- rda(species_hell ~ ., data = env_data)
# Forward selection using adjusted R-squared criterion
# Stops when adding variables no longer improves adjusted R^2
# or when p > 0.05 (permutation test)
rda_sel <- ordiR2step(rda_null, scope = formula(rda_full),
direction = 'forward', permutations = 999)
# Check VIF for collinearity in selected model
# VIF > 10: indicates problematic multicollinearity; remove or combine variables
vif.cca(rda_sel)
```
## Distance-Based RDA (db-RDA)
Extends RDA to any distance metric using principal coordinates:
```r
# Bray-Curtis distance for abundance data
bray_dist <- vegdist(species_matrix, method = 'bray')
# capscale (legacy) or dbrda (modern)
dbrda_result <- dbrda(bray_dist ~ temperature + precipitation + soil_pH,
data = env_data)
# Negative eigenvalues handled automatically (add correction if needed)
dbrda_result <- dbrda(bray_dist ~ temperature + precipitation + soil_pH,
data = env_data, add = 'lingoes')
anova(dbrda_result, permutations = 999)
anova(dbrda_result, by = 'margin', permutations = 999)
```
## Variance Partitioning
**Goal:** Quantify the unique and shared contributions of environmental vs spatial predictors to community composition.
**Approach:** Run varpart with the species matrix and predictor groups to decompose variation into unique and shared fractions, then test each unique fraction's significance with partial RDA conditioned on the other group.
Quantifies the unique and shared contributions of 2-4 sets of explanatory variables:
```r
# Partition variance between spatial and environmental predictors
# spatial_data: PCNM/MEM spatial eigenvectors or coordinates
# env_data: environmental variables
vp <- varpart(species_hell, env_data, spatial_data)
# Fractions: [a] = unique environment, [b] = shared, [c] = unique spatial, [d] = residual
vp
# Venn diagram of explained variation
plot(vp, digits = 2, bg = c('skyblue', 'tomato'))
# Test significance of unique fractions (testable fractions only)
# [a] unique environment
anova(rda(species_hell ~ . + Condition(as.matrix(spatial_data)), data = env_data),
permutations = 999)
# [c] unique spatial
anova(rda(species_hell ~ . + Condition(as.matrix(env_data)), data = as.data.frame(spatial_data)),
permutations = 999)
```
### Three-Way Partition
```r
# Environment, space, and time
vp3 <- varpart(species_hell, env_data, spatial_data, temporal_data)
plot(vp3, digits = 2, bg = c('skyblue', 'tomato', 'gold'))
```
## Indicator Species Analysis
Identifies species significantly associated with site groups using the IndVal index:
```r
library(indicspecies)
# site_groups: factor or vector of group assignments per site
# multipatt tests association of each species with group combinations
# func='IndVal.g': group-equalized indicator value (handles unbalanced groups)
# duleg=TRUE: only tests individual groups, not combinations
mp <- multipatt(species_matrix, site_groups, func = 'IndVal.g',
duleg = TRUE, control = how(nperm = 999))
summary(mp)
# Extract significant indicators
# p.value < 0.05: standard significance threshold
sig_indicators <- mp$sign[mp$sign$p.value < 0.05, ]
sig_indicators <- sig_indicators[order(sig_indicators$p.value), ]
sig_indicators
```
### Point-Biserial Correlation
```r
# Alternative: r.g function for point-biserial correlation coefficients
# More interpretable effect sizes than IndVal
pb <- multipatt(species_matrix, site_groups, func = 'r.g',
duleg = TRUE, control = how(nperm = 999))
summary(pb)
```
## CCA vs RDA Decision
| Criterion | Use CCA | Use RDA |
|-----------|---------|---------|
| Gradient length | > 3 SD (DCA axis 1) | <= 3 SD (DCA axis 1) |
| Species response | Unimodal (bell curves) | Linear |
| Data transformation | Raw abundances OK | Hellinger recommended |
| Chi-square distance | Appropriate | Not appropriate |
Check gradient length with DCA:
```r
dca <- decorana(species_matrix)
# Axis 1 length > 3 SD: unimodal methods (CCA)
# Axis 1 length <= 3 SD: linear methods (RDA)
dca
```
## Publication-Quality Triplot
**Goal:** Create a journal-ready CCA triplot showing sites, species, and environmental vectors in a single ordination figure.
**Approach:** Extract site, species, and biplot scores from the CCA result, combine into data frames, and overlay points, text labels, and arrow segments in ggplot2.
```r
library(ggplot2)
# Extract scores for ggplot
site_scores <- scores(cca_result, display = 'sites', scaling = 2)
species_scores <- scores(cca_result, display = 'species', scaling = 2)
env_scores <- scores(cca_result, display = 'bp', scaling = 2)
site_df <- as.data.frame(site_scores)
site_df$group <- env_data$habitat_type
sp_df <- as.data.frame(species_scores)
env_df <- as.data.frame(env_scores)
env_df$label <- rownames(env_df)
ggplot() +
geom_point(data = site_df, aes(x = CCA1, y = CCA2, color = group), size = 3) +
geom_text(data = sp_df, aes(x = CCA1, y = CCA2, label = rownames(sp_df)),
size = 2, color = 'grey50', alpha = 0.7) +
geom_segment(data = env_df, aes(x = 0, y = 0, xend = CCA1, yend = CCA2),
arrow = arrow(length = unit(0.2, 'cm')), color = 'red') +
geom_text(data = env_df, aes(x = CCA1 * 1.1, y = CCA2 * 1.1, label = label),
color = 'red', size = 3.5) +
theme_bw() +
labs(title = 'CCA Triplot')
```
## Related Skills
- biodiversity-metrics - Alpha and beta diversity metrics
- edna-metabarcoding - Generate community data from eDNA
- landscape-genomics - Genotype-environment associations
- microbiome/diversity-analysis - Unconstrained ordination alternative
- data-visualization/ggplot2-fundamentals - Custom ordination plots
More from GPTomics/bioSkills
- bio-admet-predictionPredicts ADMET properties using ADMETlab 3.0 API or DeepChem models. Estimates bioavailability, CYP inhibition, hERG liability, and 119 toxicity endpoints with uncertainty quantification. Filters for PAINS and other structural alerts. Use when filtering compounds for drug-likeness or prioritizing leads by predicted safety.
- bio-alignment-amplicon-clippingTrim PCR primers from aligned reads in amplicon-panel BAMs using samtools ampliconclip. Use when processing SARS-CoV-2 ARTIC, hereditary cancer panels, ctDNA hot-spot panels, or any amplicon assay where primer-derived bases would falsely confirm reference at primer footprints.
- bio-alignment-filteringFilter alignments by flags, mapping quality, and regions using samtools view and pysam. Use when extracting specific reads, removing low-quality alignments, or subsetting to target regions.
- bio-alignment-indexingCreate and use BAI/CSI indices for BAM/CRAM files using samtools and pysam. Use when enabling random access to alignment files or fetching specific genomic regions.
- bio-alignment-ioRead, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
- bio-alignment-msa-parsingParse and analyze multiple sequence alignments using Biopython. Extract sequences, identify conserved regions, analyze gaps, work with annotations, and manipulate alignment data for downstream analysis. Use when parsing or manipulating multiple sequence alignments.
- bio-alignment-msa-statisticsCalculate alignment statistics including sequence identity, conservation scores, substitution matrices, and similarity metrics. Use when comparing alignment quality, measuring sequence divergence, and analyzing evolutionary patterns.
- bio-alignment-multiplePerform multiple sequence alignment using MAFFT, MUSCLE5, ClustalOmega, or T-Coffee. Guides tool and algorithm selection based on dataset size, sequence divergence, and downstream application. Use when aligning three or more homologous sequences for phylogenetics, conservation analysis, or evolutionary studies.
- bio-alignment-pairwisePerform pairwise sequence alignment using Biopython Bio.Align.PairwiseAligner. Use when comparing two sequences, finding optimal alignments, scoring similarity, and identifying local or global matches between DNA, RNA, or protein sequences.
- bio-alignment-sortingSort alignment files by coordinate or read name using samtools and pysam. Use when preparing BAM files for indexing, variant calling, or paired-end analysis.