bio-ribo-seq-ribosome-periodicity
$
npx mdskill add GPTomics/bioSkills/bio-ribo-seq-ribosome-periodicityValidates Ribo-seq data quality using 3-nucleotide periodicity and P-site offset analysis
- Assesses library quality by checking ribosome triplet translocation patterns
- Uses Plastid for metagene profile generation and P-site offset calculation
- Analyzes periodicity by computing read coverage around start codons
- Returns periodicity metrics and offset values for downstream alignment adjustments
SKILL.md
.github/skills/bio-ribo-seq-ribosome-periodicityView on GitHub ↗
---
name: bio-ribo-seq-ribosome-periodicity
description: Validate Ribo-seq data quality by checking 3-nucleotide periodicity and calculating P-site offsets. Use when assessing library quality or determining read offsets for downstream analysis.
tool_type: python
primary_tool: Plastid
---
## Version Compatibility
Reference examples tested with: matplotlib 3.8+, numpy 1.26+, pysam 0.22+, scipy 1.12+
Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
- CLI: `<tool> --version` then `<tool> --help` to confirm flags
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
# Ribosome Periodicity Analysis
**"Check if my Ribo-seq data shows triplet periodicity"** → Validate Ribo-seq library quality by verifying 3-nucleotide translocation patterns and calculating P-site offsets from metagene profiles.
- Python: `plastid` for P-site offset calculation and metagene analysis
## 3-Nucleotide Periodicity
**Goal:** Verify that Ribo-seq reads exhibit the expected 3-nucleotide translocation pattern characteristic of active translation.
**Approach:** Load P-site mapped reads and compute metagene profiles around start codons to check for triplet periodicity.
Ribosomes move 3 nucleotides per codon. Good Ribo-seq data shows strong periodicity:
```python
from plastid import BAMGenomeArray, FivePrimeMapFactory, GenomicSegment
import numpy as np
import matplotlib.pyplot as plt
# Load aligned reads
alignments = BAMGenomeArray('riboseq.bam', mapping=FivePrimeMapFactory())
# Get metagene around start codons
# Expect strong 3-nt periodicity
```
## Calculate P-site Offset
**Goal:** Determine the optimal P-site offset from the 5' end of ribosome footprints for accurate codon-level positioning.
**Approach:** Run metagene analysis around annotated start codons and identify the offset that aligns the signal peak with the AUG position.
```python
from plastid import metagene_analysis
# The P-site offset varies by read length
# Typically 12-15 nt from 5' end for 28-30 nt reads
def determine_psite_offset(bam_path, annotation_file):
'''Determine optimal P-site offset from metagene analysis'''
from plastid import GTF2_TranscriptAssembler, BAMGenomeArray
# Load annotations
transcripts = list(GTF2_TranscriptAssembler(annotation_file))
# Load reads
alignments = BAMGenomeArray(bam_path, mapping=FivePrimeMapFactory())
# Metagene around start codons
# Peak should align with start codon position
metagene_data = metagene_analysis(
transcripts,
alignments,
upstream=50,
downstream=100
)
return metagene_data
```
## Metagene Plots
**Goal:** Visualize the metagene profile around start codons with frame-colored bars and a periodicity power spectrum.
**Approach:** Plot read counts by reading frame and compute FFT to confirm a dominant period of 3 nucleotides.
```python
def plot_metagene(metagene_data, offset=12):
'''Plot metagene profile around start codon'''
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
# Frame 0, 1, 2 around start codon
positions = np.arange(-50, 100)
# Plot by frame
for frame in range(3):
frame_positions = positions[positions % 3 == frame]
counts = metagene_data[positions % 3 == frame]
axes[0].bar(frame_positions, counts, alpha=0.7, label=f'Frame {frame}')
axes[0].set_xlabel('Position relative to start codon')
axes[0].set_ylabel('Normalized counts')
axes[0].legend()
axes[0].axvline(0, color='red', linestyle='--', label='Start')
# Periodicity
from scipy.fft import fft
fft_result = np.abs(fft(metagene_data))
freq = np.fft.fftfreq(len(metagene_data))
axes[1].plot(1/freq[1:len(freq)//2], fft_result[1:len(freq)//2])
axes[1].set_xlabel('Period (nt)')
axes[1].set_ylabel('Power')
axes[1].axvline(3, color='red', linestyle='--')
plt.tight_layout()
plt.savefig('periodicity.pdf')
```
## Assess by Read Length
**Goal:** Evaluate 3-nucleotide periodicity strength for each read length to identify the most informative footprint sizes.
**Approach:** Group reads by query length, compute periodicity score per group, and retain lengths with strong triplet signal.
```python
def periodicity_by_length(bam_path, annotation_file):
'''Calculate periodicity score for each read length'''
import pysam
# Group reads by length
reads_by_length = {}
with pysam.AlignmentFile(bam_path, 'rb') as bam:
for read in bam:
if not read.is_unmapped:
length = read.query_length
if length not in reads_by_length:
reads_by_length[length] = []
reads_by_length[length].append(read)
# Calculate periodicity for each length
# Good lengths show strong 3-nt periodicity
results = {}
for length, reads in reads_by_length.items():
if len(reads) > 1000: # Need sufficient reads
periodicity = calculate_periodicity(reads, annotation_file)
results[length] = periodicity
return results
```
## P-site Offset Table
Common P-site offsets by read length (5' end mapping):
| Read Length | P-site Offset |
|-------------|---------------|
| 28 nt | 12 |
| 29 nt | 12 |
| 30 nt | 13 |
| 31 nt | 13 |
| 32 nt | 14 |
## Validate with RiboCode
**Goal:** Run an automated periodicity and ORF detection pipeline as an independent validation of data quality.
**Approach:** Execute RiboCode's one-step command, which internally assesses periodicity and generates diagnostic plots.
```bash
# RiboCode includes periodicity analysis
RiboCode_onestep \
-g annotation.gtf \
-r riboseq.bam \
-f genome.fa \
-o output_dir
# Check output for periodicity plots
```
## Related Skills
- riboseq-preprocessing - Generate aligned BAM
- orf-detection - Uses P-site offsets
- translation-efficiency - Requires proper positioning
More from GPTomics/bioSkills
- bio-admet-predictionPredicts ADMET properties using ADMETlab 3.0 API or DeepChem models. Estimates bioavailability, CYP inhibition, hERG liability, and 119 toxicity endpoints with uncertainty quantification. Filters for PAINS and other structural alerts. Use when filtering compounds for drug-likeness or prioritizing leads by predicted safety.
- bio-alignment-amplicon-clippingTrim PCR primers from aligned reads in amplicon-panel BAMs using samtools ampliconclip. Use when processing SARS-CoV-2 ARTIC, hereditary cancer panels, ctDNA hot-spot panels, or any amplicon assay where primer-derived bases would falsely confirm reference at primer footprints.
- bio-alignment-filteringFilter alignments by flags, mapping quality, and regions using samtools view and pysam. Use when extracting specific reads, removing low-quality alignments, or subsetting to target regions.
- bio-alignment-indexingCreate and use BAI/CSI indices for BAM/CRAM files using samtools and pysam. Use when enabling random access to alignment files or fetching specific genomic regions.
- bio-alignment-ioRead, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
- bio-alignment-msa-parsingParse and analyze multiple sequence alignments using Biopython. Extract sequences, identify conserved regions, analyze gaps, work with annotations, and manipulate alignment data for downstream analysis. Use when parsing or manipulating multiple sequence alignments.
- bio-alignment-msa-statisticsCalculate alignment statistics including sequence identity, conservation scores, substitution matrices, and similarity metrics. Use when comparing alignment quality, measuring sequence divergence, and analyzing evolutionary patterns.
- bio-alignment-multiplePerform multiple sequence alignment using MAFFT, MUSCLE5, ClustalOmega, or T-Coffee. Guides tool and algorithm selection based on dataset size, sequence divergence, and downstream application. Use when aligning three or more homologous sequences for phylogenetics, conservation analysis, or evolutionary studies.
- bio-alignment-pairwisePerform pairwise sequence alignment using Biopython Bio.Align.PairwiseAligner. Use when comparing two sequences, finding optimal alignments, scoring similarity, and identifying local or global matches between DNA, RNA, or protein sequences.
- bio-alignment-sortingSort alignment files by coordinate or read name using samtools and pysam. Use when preparing BAM files for indexing, variant calling, or paired-end analysis.