bio-small-rna-seq-differential-mirna
$
npx mdskill add GPTomics/bioSkills/bio-small-rna-seq-differential-mirnaAnalyzes differential miRNA expression using DESeq2 or edgeR with small RNA-specific adjustments
- Identifies miRNAs with significant expression changes between experimental conditions
- Relies on R packages DESeq2, edgeR, and tools like miRge3 or miRDeep2 for count data
- Applies statistical models tailored for small RNA-seq data normalization and variance
- Returns lists of differentially expressed miRNAs with associated p-values and fold changes
SKILL.md
.github/skills/bio-small-rna-seq-differential-mirnaView on GitHub ↗
---
name: bio-small-rna-seq-differential-mirna
description: Perform differential expression analysis of miRNAs between conditions using DESeq2 or edgeR with small RNA-specific considerations. Use when identifying miRNAs that change between treatment groups, disease states, or developmental stages.
tool_type: r
primary_tool: DESeq2
---
## Version Compatibility
Reference examples tested with: DESeq2 1.42+, edgeR 4.0+, ggplot2 3.5+, scanpy 1.10+
Before using code patterns, verify installed versions match. If versions differ:
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
# Differential miRNA Expression
**"Find differentially expressed miRNAs between my conditions"** → Perform statistical testing on miRNA count matrices to identify miRNAs with significant expression changes, accounting for small RNA-specific normalization considerations.
- R: `DESeq2::DESeq()` or `edgeR::glmQLFTest()` on miRNA count data
## Load miRNA Count Data
```r
library(DESeq2)
# Load miRge3 or miRDeep2 counts
counts <- read.csv('miR.Counts.csv', row.names = 1)
# Create sample metadata
coldata <- data.frame(
sample = colnames(counts),
condition = factor(c('control', 'control', 'treated', 'treated')),
row.names = colnames(counts)
)
```
## DESeq2 Analysis
**Goal:** Identify miRNAs with significant expression changes between experimental conditions, accounting for small RNA-specific normalization.
**Approach:** Create a DESeqDataSet from miRNA counts, filter low-expressed miRNAs using a lower threshold than mRNA (10 reads total), run the DESeq2 pipeline, and extract results with BH-corrected p-values.
```r
# Create DESeq2 dataset
dds <- DESeqDataSetFromMatrix(
countData = round(counts), # DESeq2 requires integers
colData = coldata,
design = ~ condition
)
# Filter low-expressed miRNAs
# miRNAs typically have fewer total counts than mRNAs
# Keep miRNAs with at least 10 reads across samples
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep, ]
# Run DESeq2
dds <- DESeq(dds)
# Get results
res <- results(dds, contrast = c('condition', 'treated', 'control'))
res <- res[order(res$padj), ]
```
## Apply Shrinkage for Effect Sizes
```r
# apeglm shrinkage for more accurate log2 fold changes
# Particularly important for low-count miRNAs
library(apeglm)
res_shrunk <- lfcShrink(
dds,
coef = 'condition_treated_vs_control',
type = 'apeglm'
)
```
## Filter Significant miRNAs
```r
# Standard thresholds for miRNA DE
# padj < 0.05: FDR-corrected significance
# |log2FC| > 1: 2-fold change minimum
sig <- subset(res_shrunk, padj < 0.05 & abs(log2FoldChange) > 1)
sig <- sig[order(sig$padj), ]
# Separate up and down-regulated
up <- subset(sig, log2FoldChange > 0)
down <- subset(sig, log2FoldChange < 0)
cat('Upregulated:', nrow(up), '\n')
cat('Downregulated:', nrow(down), '\n')
```
## edgeR Alternative
```r
library(edgeR)
# Create DGEList
dge <- DGEList(counts = counts, group = coldata$condition)
# Filter low expression
keep <- filterByExpr(dge)
dge <- dge[keep, , keep.lib.sizes = FALSE]
# Normalize
dge <- calcNormFactors(dge)
# Design matrix
design <- model.matrix(~ condition, data = coldata)
# Estimate dispersion
dge <- estimateDisp(dge, design)
# Fit model and test
fit <- glmQLFit(dge, design)
qlf <- glmQLFTest(fit, coef = 2)
# Get results
res_edger <- topTags(qlf, n = Inf)$table
```
## Visualization
```r
library(ggplot2)
library(EnhancedVolcano)
# Volcano plot
EnhancedVolcano(
res_shrunk,
lab = rownames(res_shrunk),
x = 'log2FoldChange',
y = 'padj',
pCutoff = 0.05,
FCcutoff = 1,
title = 'Differential miRNA Expression'
)
# MA plot
plotMA(res_shrunk, ylim = c(-4, 4))
```
## Heatmap of DE miRNAs
```r
library(pheatmap)
# Get normalized counts
vsd <- vst(dds, blind = FALSE)
# Select significant miRNAs
sig_mirnas <- rownames(sig)
mat <- assay(vsd)[sig_mirnas, ]
# Z-score scale rows
mat_scaled <- t(scale(t(mat)))
pheatmap(
mat_scaled,
annotation_col = coldata['condition'],
cluster_rows = TRUE,
cluster_cols = TRUE,
show_rownames = nrow(mat) < 50
)
```
## Export Results
```r
# Full results with normalized counts
res_df <- as.data.frame(res_shrunk)
res_df$miRNA <- rownames(res_df)
res_df$baseMean_norm <- rowMeans(counts(dds, normalized = TRUE)[rownames(res_df), ])
write.csv(res_df, 'DE_miRNAs_full.csv', row.names = FALSE)
# Significant only
write.csv(as.data.frame(sig), 'DE_miRNAs_significant.csv')
```
## Related Skills
- mirge3-analysis - Get miRNA counts
- mirdeep2-analysis - Alternative quantification
- target-prediction - Predict targets of DE miRNAs
- differential-expression/deseq2-basics - General DE analysis concepts
More from GPTomics/bioSkills
- bio-admet-predictionPredicts ADMET properties using ADMETlab 3.0 API or DeepChem models. Estimates bioavailability, CYP inhibition, hERG liability, and 119 toxicity endpoints with uncertainty quantification. Filters for PAINS and other structural alerts. Use when filtering compounds for drug-likeness or prioritizing leads by predicted safety.
- bio-alignment-amplicon-clippingTrim PCR primers from aligned reads in amplicon-panel BAMs using samtools ampliconclip. Use when processing SARS-CoV-2 ARTIC, hereditary cancer panels, ctDNA hot-spot panels, or any amplicon assay where primer-derived bases would falsely confirm reference at primer footprints.
- bio-alignment-filteringFilter alignments by flags, mapping quality, and regions using samtools view and pysam. Use when extracting specific reads, removing low-quality alignments, or subsetting to target regions.
- bio-alignment-indexingCreate and use BAI/CSI indices for BAM/CRAM files using samtools and pysam. Use when enabling random access to alignment files or fetching specific genomic regions.
- bio-alignment-ioRead, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
- bio-alignment-msa-parsingParse and analyze multiple sequence alignments using Biopython. Extract sequences, identify conserved regions, analyze gaps, work with annotations, and manipulate alignment data for downstream analysis. Use when parsing or manipulating multiple sequence alignments.
- bio-alignment-msa-statisticsCalculate alignment statistics including sequence identity, conservation scores, substitution matrices, and similarity metrics. Use when comparing alignment quality, measuring sequence divergence, and analyzing evolutionary patterns.
- bio-alignment-multiplePerform multiple sequence alignment using MAFFT, MUSCLE5, ClustalOmega, or T-Coffee. Guides tool and algorithm selection based on dataset size, sequence divergence, and downstream application. Use when aligning three or more homologous sequences for phylogenetics, conservation analysis, or evolutionary studies.
- bio-alignment-pairwisePerform pairwise sequence alignment using Biopython Bio.Align.PairwiseAligner. Use when comparing two sequences, finding optimal alignments, scoring similarity, and identifying local or global matches between DNA, RNA, or protein sequences.
- bio-alignment-sortingSort alignment files by coordinate or read name using samtools and pysam. Use when preparing BAM files for indexing, variant calling, or paired-end analysis.