bio-workflows-smrna-pipeline

Name: bio-workflows-smrna-pipeline
Author: GPTomics/bioSkills

$npx mdskill add GPTomics/bioSkills/bio-workflows-smrna-pipeline

Analyzes small RNA-seq data from FASTQ files to identify differential miRNA expression

Processes miRNA, piRNA, and other small RNA sequencing data for differential expression analysis
Uses cutadapt for trimming, miRDeep2 for quantification, and DESeq2 for differential analysis
Automates adapter trimming, genome alignment, quantification, and target prediction workflows
Delivers results through structured output files and integrates with downstream analysis tools

SKILL.md

.github/skills/bio-workflows-smrna-pipelineView on GitHub ↗

---
name: bio-workflows-smrna-pipeline
description: End-to-end small RNA-seq analysis from FASTQ to differential miRNA expression. Use when analyzing miRNA, piRNA, or other small RNA sequencing data.
tool_type: mixed
primary_tool: miRDeep2
---

## Version Compatibility

Reference examples tested with: DESeq2 1.42+, cutadapt 4.4+

Before using code patterns, verify installed versions match. If versions differ:
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters
- CLI: `<tool> --version` then `<tool> --help` to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Small RNA-seq Pipeline

**"Analyze my small RNA-seq data from FASTQ to differential miRNAs"** → Orchestrate adapter trimming (cutadapt), miRNA quantification (miRDeep2/miRge3), novel miRNA discovery, differential expression (DESeq2), and target prediction (miRanda).

## Pipeline Overview

```
FASTQ → cutadapt trim → miRDeep2 → Quantification → DESeq2 → Target prediction
```

## Step 1: Preprocessing

```bash
# Adapter trimming and size selection
cutadapt -a TGGAATTCTCGGGTGCCAAGG \
    --minimum-length 18 --maximum-length 30 \
    -o trimmed.fastq.gz reads.fastq.gz
```

## Step 2: miRDeep2 Analysis

```bash
# Align to genome
mapper.pl trimmed.fastq.gz -e -h -i -j -l 18 \
    -m -p genome_index -s reads_collapsed.fa \
    -t reads_collapsed_vs_genome.arf

# miRNA quantification and novel prediction
miRDeep2.pl reads_collapsed.fa genome.fa \
    reads_collapsed_vs_genome.arf \
    mature_ref.fa none hairpin_ref.fa
```

## Step 3: Differential Expression

```r
library(DESeq2)
counts <- read.csv('mirna_counts.csv', row.names = 1)
dds <- DESeqDataSetFromMatrix(counts, colData, ~condition)
dds <- DESeq(dds)
results <- results(dds)
```

## Step 4: Target Prediction

```bash
# miRanda for target prediction
miranda mature_mirnas.fa target_3utrs.fa -out targets.txt
```

## QC Checkpoints

1. **After trimming**: Size distribution should peak at 21-23nt
2. **After alignment**: >70% mapping rate expected
3. **After DE**: Check volcano plot and PCA

## Related Skills

- small-rna-seq/mirdeep2-analysis - Detailed miRDeep2
- small-rna-seq/differential-mirna - DE analysis
- small-rna-seq/target-prediction - Target analysis