bio-single-cell-perturb-seq
$
npx mdskill add GPTomics/bioSkills/bio-single-cell-perturb-seqAnalyzes Perturb-seq and CROP-seq data to identify gene function using single-cell RNA-seq
- Solves the task of linking guide RNA perturbations to transcriptional changes in single cells
- Uses Pertpy, Scanpy, MAGeCK, and Pandas for data processing and analysis
- Classifies perturbations and prioritizes gene functions using Mixscape and Augur
- Delivers annotated single-cell data with gene function insights and phenotypic links
SKILL.md
.github/skills/bio-single-cell-perturb-seqView on GitHub ↗
---
name: bio-single-cell-perturb-seq
description: Analyze Perturb-seq and CROP-seq CRISPR screening data integrated with scRNA-seq. Use when identifying gene function through pooled genetic perturbations in single cells.
tool_type: python
primary_tool: Pertpy
---
## Version Compatibility
Reference examples tested with: MAGeCK 0.5+, pandas 2.2+, pertpy 0.7+, scanpy 1.10+
Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
# Perturb-seq Analysis
**"Analyze my Perturb-seq CRISPR screen"** → Link guide RNA assignments to transcriptional phenotypes in pooled CRISPR screens with single-cell readout to identify gene function.
- Python: `pertpy.tl.Mixscape(adata)` for perturbation classification, `pertpy.tl.Augur` for prioritization
## Load and Annotate Perturbations
```python
import scanpy as sc
import pertpy as pt
adata = sc.read_h5ad('perturb_seq.h5ad')
# Guide assignments typically stored in obs
# Format: cell barcode -> guide identity -> target gene
adata.obs['guide_id'] = guide_assignments['guide_id']
adata.obs['target_gene'] = guide_assignments['target_gene']
# Mark non-targeting controls
adata.obs['is_control'] = adata.obs['target_gene'] == 'non-targeting'
```
## Pertpy Analysis
```python
# Initialize perturbation analysis
ps = pt.tl.PerturbationSpace(adata)
# Differential expression per perturbation vs control
de = pt.tl.PseudobulkDE(adata)
de.fit(
groupby='target_gene',
control='non-targeting',
n_threads=8
)
results = de.results()
# Filter significant genes
sig_results = results[results['pval_adj'] < 0.05]
# Perturbation signatures (effect sizes)
ps = pt.tl.PerturbationSignature(adata)
ps.compute(groupby='target_gene', control='non-targeting')
# Get signature matrix
signatures = ps.get_signature_matrix()
```
## Perturbation Embedding
```python
# Compute perturbation-level embeddings
pt.tl.perturbation_embedding(adata, groupby='target_gene', method='mean')
# Cluster perturbations by phenotype
pt.tl.cluster_perturbations(adata, resolution=0.5)
# Find functionally related perturbations
pt.pl.perturbation_heatmap(adata, groupby='perturbation_cluster')
```
## Mixscape (Seurat v5)
**Goal:** Classify cells in a CRISPR screen as successfully perturbed or escaped based on their transcriptional response relative to non-targeting controls.
**Approach:** Compute per-cell perturbation signatures against non-targeting controls using PCA-projected differences, then run Mixscape mixture model classification to separate knockout-responsive cells from escapees.
```r
library(Seurat)
library(SeuratObject)
# Load Perturb-seq data
seurat <- Read10X('filtered_feature_bc_matrix/')
seurat <- CreateSeuratObject(seurat)
# Add perturbation metadata
seurat <- AddMetaData(seurat, metadata = perturbation_calls)
# Standard preprocessing
seurat <- NormalizeData(seurat)
seurat <- FindVariableFeatures(seurat)
seurat <- ScaleData(seurat)
seurat <- RunPCA(seurat)
seurat <- RunUMAP(seurat, dims = 1:30)
# Mixscape: Classify perturbed vs non-perturbed cells
seurat <- CalcPerturbSig(
seurat,
assay = 'RNA',
slot = 'data',
new.assay.name = 'PRTB',
gd.class = 'gene',
nt.cell.class = 'NT',
num.neighbors = 20,
reduction = 'pca',
ndims = 15
)
# Run Mixscape classification
seurat <- RunMixscape(
seurat,
assay = 'PRTB',
slot = 'scale.data',
labels = 'gene',
nt.class.name = 'NT',
min.de.genes = 5,
iter.num = 10,
de.assay = 'RNA',
prtb.type = 'KO'
)
# View classification results
table(seurat$mixscape_class.global)
```
## Mixscape Visualization
```r
# UMAP colored by perturbation
DimPlot(seurat, reduction = 'umap', group.by = 'mixscape_class', label = TRUE)
# Perturbation score distribution
VlnPlot(seurat, features = 'mixscape_class_p_ko', group.by = 'gene')
# DE genes for each perturbation
MixscapeHeatmap(seurat, ident.1 = 'TP53', ident.2 = 'NT', balanced = TRUE)
# LDA projection
seurat <- MixscapeLDA(seurat, labels = 'gene', nt.class.name = 'NT')
LDAPlot(seurat)
```
## Guide Assignment from CRISPR Feature Barcode
```python
import pandas as pd
# From Cell Ranger output (CRISPR Guide Capture)
guides = pd.read_csv('crispr_analysis/protospacer_calls_per_cell.csv')
# Clean up guide calls
guides['cell_barcode'] = guides['cell_barcode'].str.replace('-1', '')
guides = guides[guides['num_features'] == 1] # Single guide per cell
# Merge with expression data
adata.obs = adata.obs.merge(
guides[['cell_barcode', 'feature_call', 'target_gene']],
left_index=True,
right_on='cell_barcode',
how='left'
)
```
## Guide Quality Control
```python
# Check guide representation
guide_counts = adata.obs['target_gene'].value_counts()
print(f'Guides per target: {guide_counts.mean():.1f}')
print(f'Cells per guide: {adata.obs.groupby("guide_id").size().mean():.1f}')
# Filter low-representation guides
# Standard: keep guides with >= 100 cells
min_cells = 100
valid_guides = guide_counts[guide_counts >= min_cells].index
adata = adata[adata.obs['target_gene'].isin(valid_guides)]
# Check for guide bias
sc.pl.violin(adata, keys='n_genes_by_counts', groupby='target_gene', rotation=90)
```
## Multi-Guide Analysis
```python
# Cells with multiple guides (MOI > 1)
multi_guide = adata.obs[adata.obs['num_guides'] > 1]
print(f'Multi-guide cells: {len(multi_guide) / len(adata):.1%}')
# Options:
# 1. Remove multi-guide cells
adata = adata[adata.obs['num_guides'] == 1]
# 2. Keep only cells where guides target same gene
# 3. Analyze combinatorial effects
```
## Pseudobulk Differential Expression
```python
# Aggregate to pseudobulk for robust DE
from pertpy.tools import PseudobulkDE
pb = PseudobulkDE(adata)
pb.fit(
groupby='target_gene',
control='non-targeting',
method='deseq2', # or 'edger', 'wilcoxon'
min_cells=50
)
# Get results for specific perturbation
tp53_de = pb.results('TP53')
sig_genes = tp53_de[tp53_de['padj'] < 0.05].sort_values('log2FoldChange')
```
## Pathway Enrichment
```python
import decoupler as dc
# Get DE genes per perturbation
de_results = pb.results()
# Run pathway enrichment
dc.run_ora(
mat=de_results,
net=dc.get_resource('MSigDB'),
source='geneset',
target='gene'
)
# Visualize top pathways
dc.plot_barplot(de_results, 'TP53', top_n=20)
```
## Screen QC Metrics
| Metric | Good | Acceptable | Poor |
|--------|------|------------|------|
| Cells per guide | >200 | 100-200 | <100 |
| Guide detection rate | >90% | 80-90% | <80% |
| Non-targeting cells | 5-15% | 15-25% | >25% |
| Mixscape KO fraction | >50% | 30-50% | <30% |
## Related Skills
- single-cell/preprocessing - scRNA-seq preprocessing
- single-cell/markers-annotation - DE and marker gene concepts
- single-cell/batch-integration - Multi-sample integration
- crispr-screens/mageck-analysis - Bulk screen analysis
More from GPTomics/bioSkills
- bio-admet-predictionPredicts ADMET properties using ADMETlab 3.0 API or DeepChem models. Estimates bioavailability, CYP inhibition, hERG liability, and 119 toxicity endpoints with uncertainty quantification. Filters for PAINS and other structural alerts. Use when filtering compounds for drug-likeness or prioritizing leads by predicted safety.
- bio-alignment-amplicon-clippingTrim PCR primers from aligned reads in amplicon-panel BAMs using samtools ampliconclip. Use when processing SARS-CoV-2 ARTIC, hereditary cancer panels, ctDNA hot-spot panels, or any amplicon assay where primer-derived bases would falsely confirm reference at primer footprints.
- bio-alignment-filteringFilter alignments by flags, mapping quality, and regions using samtools view and pysam. Use when extracting specific reads, removing low-quality alignments, or subsetting to target regions.
- bio-alignment-indexingCreate and use BAI/CSI indices for BAM/CRAM files using samtools and pysam. Use when enabling random access to alignment files or fetching specific genomic regions.
- bio-alignment-ioRead, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
- bio-alignment-msa-parsingParse and analyze multiple sequence alignments using Biopython. Extract sequences, identify conserved regions, analyze gaps, work with annotations, and manipulate alignment data for downstream analysis. Use when parsing or manipulating multiple sequence alignments.
- bio-alignment-msa-statisticsCalculate alignment statistics including sequence identity, conservation scores, substitution matrices, and similarity metrics. Use when comparing alignment quality, measuring sequence divergence, and analyzing evolutionary patterns.
- bio-alignment-multiplePerform multiple sequence alignment using MAFFT, MUSCLE5, ClustalOmega, or T-Coffee. Guides tool and algorithm selection based on dataset size, sequence divergence, and downstream application. Use when aligning three or more homologous sequences for phylogenetics, conservation analysis, or evolutionary studies.
- bio-alignment-pairwisePerform pairwise sequence alignment using Biopython Bio.Align.PairwiseAligner. Use when comparing two sequences, finding optimal alignments, scoring similarity, and identifying local or global matches between DNA, RNA, or protein sequences.
- bio-alignment-sortingSort alignment files by coordinate or read name using samtools and pysam. Use when preparing BAM files for indexing, variant calling, or paired-end analysis.