bio-tcr-bcr-analysis-immcantation-analysis
$
npx mdskill add GPTomics/bioSkills/bio-tcr-bcr-analysis-immcantation-analysisAnalyze B cell repertoires for somatic hypermutation and clonal evolution.
- Enables study of affinity maturation and germinal center dynamics.
- Depends on Immcantation suite including alakazam, shazam, and scoper.
- Introspects R package APIs to adapt code to installed versions.
- Delivers phylogenetic trees and mutation metrics via R plotting functions.
SKILL.md
.github/skills/bio-tcr-bcr-analysis-immcantation-analysisView on GitHub ↗
---
name: bio-tcr-bcr-analysis-immcantation-analysis
description: Analyze BCR repertoires for somatic hypermutation, clonal lineages, and B cell phylogenetics using the Immcantation framework. Use when studying B cell affinity maturation, germinal center dynamics, or antibody evolution.
tool_type: r
primary_tool: alakazam
---
## Version Compatibility
Reference examples tested with: MiXCR 4.6+, ggplot2 3.5+
Before using code patterns, verify installed versions match. If versions differ:
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
# Immcantation Analysis
**"Analyze B cell repertoire evolution and clonal lineages"** → Study somatic hypermutation, build B cell phylogenies, and track affinity maturation using the Immcantation framework for BCR repertoire analysis.
- R: `alakazam::plotMutability()`, `dowser::buildPhylipLineage()`, `scoper::spectralClones()`
Requires Immcantation suite: alakazam 1.3+, shazam 1.2+, scoper 1.3+, dowser 2.0+, tigger 1.1+.
## Load and Format Data
**Goal:** Import AIRR-formatted repertoire data into the Immcantation framework for downstream analysis.
**Approach:** Read Change-O/AIRR tab-delimited files into R data frames with required V(D)J annotation columns.
```r
library(alakazam)
library(shazam)
library(dplyr)
# Load AIRR-formatted data (from MiXCR, IMGT/HighV-QUEST, etc.)
db <- readChangeoDb('clones_airr.tsv')
# Required columns:
# sequence_id, sequence, v_call, d_call, j_call, junction, junction_aa
```
## Clonal Clustering
**Goal:** Group B cell sequences into clonal lineages based on junction sequence similarity.
**Approach:** Apply hierarchical clustering on nucleotide distance of junction regions with a threshold-based cutoff.
```r
library(scoper)
# Assign clones based on junction similarity
# Threshold typically 0.15-0.2 (15-20% nucleotide distance)
db <- hierarchicalClones(
db,
threshold = 0.15,
method = 'nt',
linkage = 'single'
)
# Count clones
clone_sizes <- countClones(db, groups = 'sample_id')
```
## Somatic Hypermutation Analysis
**Goal:** Quantify somatic hypermutation rates across replacement and silent categories for each clone.
**Approach:** Compare observed sequences to germline alignments using the S5F targeting model to count and classify mutations.
```r
# Calculate mutation frequencies
db <- observedMutations(
db,
sequenceColumn = 'sequence_alignment',
germlineColumn = 'germline_alignment_d_mask',
regionDefinition = IMGT_V,
mutationDefinition = MUTATION_SCHEMES$S5F
)
# Mutation frequency columns added:
# mu_count_seq_r, mu_count_seq_s (replacement/silent mutations)
# mu_freq_seq_r, mu_freq_seq_s (frequencies)
# Summarize by clone
mutation_summary <- db %>%
group_by(clone_id) %>%
summarize(
mean_mu = mean(mu_freq_seq_r, na.rm = TRUE),
n_sequences = n()
)
```
## Selection Analysis
**Goal:** Test whether observed replacement/silent mutation ratios deviate from neutral expectation, indicating positive or negative selection.
**Approach:** Estimate BASELINe selection strength (sigma) by comparing observed R/S ratios to a null model of somatic hypermutation targeting.
```r
library(shazam)
# Test for selection pressure
# Compares observed R/S ratio to expected under neutrality
baseline <- estimateBaseline(
db,
sequenceColumn = 'sequence_alignment',
germlineColumn = 'germline_alignment_d_mask',
testStatistic = 'focused',
regionDefinition = IMGT_V,
nproc = 4
)
# Summarize selection
selection <- summarizeBaseline(baseline, returnType = 'df')
# Positive sigma = positive selection (beneficial mutations retained)
# Negative sigma = negative selection (deleterious mutations removed)
```
## Build Clonal Lineage Trees
**Goal:** Reconstruct phylogenetic lineage trees for each B cell clone to visualize affinity maturation pathways.
**Approach:** Build maximum parsimony trees from clonal sequence alignments using PHYLIP's dnapars algorithm via dowser.
```r
library(dowser)
# Build lineage trees for each clone
# Requires multiple sequences per clone
clones_multi <- db %>%
group_by(clone_id) %>%
filter(n() >= 3) %>%
ungroup()
# Build trees using maximum parsimony
trees <- buildPhylipLineage(
clones_multi,
phylip_exec = 'dnapars',
rm_temp = TRUE
)
# Plot a tree
plotTrees(trees[[1]])
```
## Germline Inference
**Goal:** Discover novel V gene alleles and correct V gene assignments using individual-level genotyping.
**Approach:** Infer novel alleles from mutation patterns with TIgGER, build a personalized genotype, and reassign allele calls.
```r
library(tigger)
# Infer novel V gene alleles
novel <- findNovelAlleles(
db,
germline_db = 'IMGT_Human_IGHV.fasta',
nproc = 4
)
# Genotype the individual
genotype <- inferGenotype(db, germline_db = 'IMGT_Human_IGHV.fasta')
# Correct V gene calls
db <- reassignAlleles(db, genotype)
```
## Visualization
**Goal:** Generate summary plots of mutation frequencies and V gene usage across samples.
**Approach:** Plot mutation frequency distributions with ggplot2 histograms and V gene usage bar charts via alakazam helpers.
```r
# Plot mutation frequency distribution
library(ggplot2)
ggplot(db, aes(x = mu_freq_seq_r)) +
geom_histogram(bins = 50) +
facet_wrap(~ sample_id) +
labs(x = 'Replacement Mutation Frequency', y = 'Count')
# Plot V gene usage
v_usage <- countGenes(db, gene = 'v_call', groups = 'sample_id')
plotGeneUsage(v_usage, gene = 'v_call')
```
## Related Skills
- mixcr-analysis - Generate input clonotype data
- vdjtools-analysis - Diversity metrics (TCR-focused)
- phylogenetics/tree-io - General tree concepts
More from GPTomics/bioSkills
- bio-admet-predictionPredicts ADMET properties using ADMETlab 3.0 API or DeepChem models. Estimates bioavailability, CYP inhibition, hERG liability, and 119 toxicity endpoints with uncertainty quantification. Filters for PAINS and other structural alerts. Use when filtering compounds for drug-likeness or prioritizing leads by predicted safety.
- bio-alignment-amplicon-clippingTrim PCR primers from aligned reads in amplicon-panel BAMs using samtools ampliconclip. Use when processing SARS-CoV-2 ARTIC, hereditary cancer panels, ctDNA hot-spot panels, or any amplicon assay where primer-derived bases would falsely confirm reference at primer footprints.
- bio-alignment-filteringFilter alignments by flags, mapping quality, and regions using samtools view and pysam. Use when extracting specific reads, removing low-quality alignments, or subsetting to target regions.
- bio-alignment-indexingCreate and use BAI/CSI indices for BAM/CRAM files using samtools and pysam. Use when enabling random access to alignment files or fetching specific genomic regions.
- bio-alignment-ioRead, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
- bio-alignment-msa-parsingParse and analyze multiple sequence alignments using Biopython. Extract sequences, identify conserved regions, analyze gaps, work with annotations, and manipulate alignment data for downstream analysis. Use when parsing or manipulating multiple sequence alignments.
- bio-alignment-msa-statisticsCalculate alignment statistics including sequence identity, conservation scores, substitution matrices, and similarity metrics. Use when comparing alignment quality, measuring sequence divergence, and analyzing evolutionary patterns.
- bio-alignment-multiplePerform multiple sequence alignment using MAFFT, MUSCLE5, ClustalOmega, or T-Coffee. Guides tool and algorithm selection based on dataset size, sequence divergence, and downstream application. Use when aligning three or more homologous sequences for phylogenetics, conservation analysis, or evolutionary studies.
- bio-alignment-pairwisePerform pairwise sequence alignment using Biopython Bio.Align.PairwiseAligner. Use when comparing two sequences, finding optimal alignments, scoring similarity, and identifying local or global matches between DNA, RNA, or protein sequences.
- bio-alignment-sortingSort alignment files by coordinate or read name using samtools and pysam. Use when preparing BAM files for indexing, variant calling, or paired-end analysis.