bio-atac-seq-nucleosome-positioning
$
npx mdskill add GPTomics/bioSkills/bio-atac-seq-nucleosome-positioningMap nucleosome positions from ATAC-seq fragment patterns
- Characterize nucleosome organization at promoters and enhancers.
- Depends on NucleoATAC, ATACseqQC, DANPOS3, or scprinter tools.
- Analyzes Tn5 cut fragments to determine center positions and occupancy.
- Outputs V-plots for visualizing chromatin structure comparisons.
SKILL.md
.github/skills/bio-atac-seq-nucleosome-positioningView on GitHub ↗
---
name: bio-atac-seq-nucleosome-positioning
description: Map nucleosome center positions, occupancy, and fuzziness from ATAC-seq fragment-size patterns using NucleoATAC, ATACseqQC, DANPOS3, or scprinter. Use when characterizing nucleosome organization at promoters and enhancers, calling +1/-1 nucleosomes flanking NFRs, generating V-plots for chromatin structure visualization, or comparing nucleosome positioning between conditions.
tool_type: mixed
primary_tool: NucleoATAC
---
## Version Compatibility
Reference examples tested with: NucleoATAC 0.3.4+, ATACseqQC 1.26+, DANPOS 3.1+, samtools 1.19+, pysam 0.22+, pyBigWig 0.3+, BSgenome.Hsapiens.UCSC.hg38 1.4+, TxDb.Hsapiens.UCSC.hg38.knownGene 3.18+.
NucleoATAC is unmaintained since 2018 but remains the canonical ATAC-specific nucleosome caller; ATACseqQC, DANPOS3, and scprinter are actively developed alternatives. Verify versions before use:
- Python: `pip show <package>` then `help(module.function)` to check signatures
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters
- CLI: `<tool> --version` then `<tool> --help` to confirm flags
If code throws unexpected errors, introspect the installed package and adapt rather than retrying.
# Nucleosome Positioning
**"Where are the nucleosomes in my ATAC-seq data?"** -> Use fragment-size classes (Tn5 cuts twice through naked DNA generating short fragments; once on each side of a single nucleosome generating ~147+linker fragments) to call nucleosome centers, occupancy scores, and the spacing pattern around regulatory elements.
- CLI: `nucleoatac run --bed regions.bed --bam sample.bam --fasta genome.fa`
- R: `ATACseqQC::splitGAlignmentsByCut()` -> fragment classes; `factorFootprints()` -> per-TF flanking nuc analysis
- CLI: `danpos3 dpos sample.bam` (alternative; supports MNase, ATAC, DNase)
- Python: `scprinter` for multi-scale nucleosome inference
## Nucleosome Physics for ATAC
A nucleosome wraps ~147 bp DNA in 1.65 turns. Adjacent nucleosomes are separated by 20-50 bp linker; mean **nucleosome repeat length (NRL)** is species-dependent:
| Cell type / organism | NRL | Notes |
|----------------------|-----|-------|
| Yeast S. cerevisiae | 165 bp | Tightly packed; less linker |
| Drosophila S2 | 175-185 bp | |
| Mouse ES cells | 188-196 bp | |
| Human HEK293 / K562 | 196-200 bp | Standard somatic |
| Human cortical neurons | 211 bp | Longer linker |
| Sperm chromatin | 240-250 bp | Tight packaging via protamines |
| Active gene bodies | -10 bp shorter than genome avg | Active transcription disrupts |
NRL determines fragment-size peak positions. ATAC mono-nucleosome peak is at NRL (NOT 147 bp -- that's the protected length; ATAC fragments span the full nucleosome+linker). Di-nuc is at 2x NRL minus a small overlap.
## Fragment-Size Classes (Buenrostro 2013, refined)
| Class | Fragment range | Origin | Use |
|-------|---------------|--------|-----|
| Sub-nucleosomal / NFR | < 100 bp | Two Tn5 cuts in naked accessible DNA | TF binding, footprinting |
| Mono-nucleosomal | 180-247 bp | Tn5 cuts on each side of one nucleosome | Nucleosome positioning |
| Di-nucleosomal | 315-473 bp | Tn5 cuts span two nucleosomes | Phasing, NRL estimation |
| Tri-nucleosomal | 558-615 bp | Three nucleosomes | Heterochromatin / phasing |
| > 700 bp | Rare | Often artefact (chimeric); discard | -- |
Mono-nucleosome window 180-247 bp is the Buenrostro 2013 convention; ATACseqQC uses 180-250. Adjust for the target organism's NRL.
## V-Plot Interpretation
V-plots (fragment-size vs position) are diagnostic. X-axis is position relative to a feature (TSS, motif center); Y-axis is fragment size. Aggregate density forms characteristic patterns:
| Pattern | Visual | Meaning |
|---------|--------|---------|
| V (apex at center, low size at center, increasing flanks) | Classic V | TF or NFR at center, flanking nucleosomes |
| W (two V's flanking center) | W-shape | NFR at center plus +1 / -1 nucleosomes |
| Inverted V (peak at center) | Mountain | Fragment fully enclosed at feature; e.g. nucleosome-bound TF |
| Flat band at 200 bp | Horizontal line | Constitutive nucleosome (no positioning relative to feature) |
| 10.4 bp helical phasing on V | Sub-peaks at 50, 60, 70, 80 bp size | Tn5 helical preference visible; high-quality library |
V-plots are the primary diagnostic for whether nucleosome-positioning analysis will succeed. Flat-band patterns mean no positioning information; classic V/W patterns mean positioning is recoverable.
## Algorithmic Taxonomy
| Tool | Method | Resolution | Strength | Fails when |
|------|--------|-----------|----------|------------|
| NucleoATAC | Cross-correlation with idealized V-plot template; per-base occupancy + nucleosome calls | Single-bp | ATAC-specific; provides occupancy + fuzziness | Unmaintained since 2018; pegs Python 2/3.6; struggles on chromatin without clear NRL |
| ATACseqQC | Fragment-size split + Tn5-shifted GAlignments + V-plot from BAM | Region-level | R/Bioconductor; integrates with TxDb / motif analysis | No per-base nucleosome calls; visualization-focused |
| DANPOS3 | Smoothing + peak call on cleavage signal; tested on MNase, ATAC, DNase | ~50 bp | Robust differential mode (`dpeak`); MNase legacy; broadly maintained | Designed for MNase-Seq; ATAC adaptation needs careful parameter tuning |
| scprinter | CNN multi-scale; resolves co-occurring TF + nucleosome footprints | Single-bp | Modern; single-cell aware; multi-scale | Newer; benchmarks evolving; GPU recommended |
| custom (pysam V-plot) | Fragment counting + 2D density | Region-level | Maximally flexible; reproducible | Requires manual calling logic; slow |
Methodology evolves; verify against current Schep 2015 (NucleoATAC), Chen 2013 (DANPOS), Bao 2024 (scprinter) before locking pipelines.
## +1 Nucleosome Calling
The +1 nucleosome (first nucleosome downstream of TSS, immediately bordering the NFR) is the most-studied positioning feature. Its position relative to TSS determines transcription initiation kinetics.
**Canonical +1 position:** +50 to +60 bp from TSS in metazoa; -100 to -120 bp from TATA in yeast; varies by gene type (Pol II vs Pol III, housekeeping vs developmental).
**Calling strategy:**
```bash
# 1. Define gene-body intervals
bedtools slop -i genes.bed -g chrom.sizes -l 200 -r 1000 > gene_bodies.bed
# 2. Run NucleoATAC
nucleoatac run --bed gene_bodies.bed --bam sample.dedup.bam --fasta genome.fa \
--out tss_nuc/ --cores 8
# 3. The first nucleosome downstream of each TSS in nucpos.bed is +1
```
A failure to detect a clear +1 peak in aggregate V-plot suggests TSS annotation is wrong or library is over-transposed.
## Per-Tool Failure Modes
### NucleoATAC -- Region size and depth dependence
**Trigger:** Short region BED (< 1 kb per region); shallow library (< 25M nuclear reads).
**Mechanism:** NucleoATAC fits an idealized V-plot template per region. Short regions provide too few fragments for stable correlation; shallow data provides noisy templates.
**Symptom:** No nucleosome calls in shallow regions; "occupancy" track is flat at zero.
**Fix:** Use regions >= 500 bp; merge adjacent peaks via bedtools to ensure region size; require >= 30M nuclear reads.
### NucleoATAC -- Maintenance status
**Trigger:** Installing NucleoATAC in 2025+.
**Mechanism:** Last release 2018; pegs Python 3.6 in some installs; depends on outdated NumPy API.
**Fix:** Use a dedicated conda env (`conda create -n nucleoatac python=3.7 numpy=1.18 scipy=1.5 pysam`); accept it works but is no longer updated. Consider scprinter or DANPOS3 alternatives for new projects.
### ATACseqQC factorFootprints -- Asymmetric nucleosome flanks
**Trigger:** Pioneer-factor binding sites where one face is on a nucleosome.
**Mechanism:** factorFootprints assumes symmetric flanking nucleosomes. Pioneer TFs (FOXA1, GATA) only have nucleosome on one side -> asymmetric output.
**Symptom:** Single shoulder in flanking signal; unbalanced V-plot.
**Fix:** Treat asymmetry as biological signal, not artefact. For pioneers, use stranded analysis.
### DANPOS dpos with default parameters -- ATAC mismatch
**Trigger:** Running `danpos3 dpos` with MNase defaults on ATAC.
**Mechanism:** DANPOS3's smoothing window and peak-calling defaults are tuned for MNase signal (smoother coverage). ATAC's sharper signal requires `--smooth_width 80 --width 145` or similar; otherwise calls are over-smoothed.
**Fix:** Use ATAC-tuned parameters. See DANPOS docs for ATAC-specific recipe; or use NucleoATAC instead.
### Mono-nucleosome filter window mis-set
**Trigger:** Using strict 147 bp filter for mono-nuc fraction; using 100-180 bp instead of 180-247.
**Mechanism:** Mono-nuc fragments are 180-247 bp because they span the nucleosome AND a linker. Filtering tighter excludes the legitimate signal.
**Symptom:** Mono-nuc count is much lower than expected (< 30% of NFR count).
**Fix:** Use Buenrostro 2013 windows: NFR < 100, mono 180-247, di 315-473.
## Decision Tree by Goal
| Goal | Recommended workflow |
|------|---------------------|
| Per-base nucleosome occupancy track | NucleoATAC (with caveat about maintenance); or scprinter |
| V-plot at TSS or motif center | ATACseqQC vPlot |
| Differential nucleosome positioning between conditions | DANPOS3 dpeak |
| +1 nucleosome calling at all genes | NucleoATAC + post-process to first nuc downstream of TSS |
| Single-cell nucleosome positioning | scprinter |
| Quick fragment-size QC plot | ATACseqQC fragSizeDist |
| NRL estimation | Custom Fourier / autocorrelation on fragment-end coverage |
| Nucleosome-aware peak calling | MACS3 hmmratac (peak-calling skill) |
## Estimating NRL from Fragment-Size Distribution
```python
import numpy as np, pysam
from scipy.signal import find_peaks
bam = pysam.AlignmentFile('sample.bam', 'rb')
frag_lengths = [abs(r.template_length) for r in bam.fetch()
if r.is_proper_pair and r.is_read1 and 0 < abs(r.template_length) < 1500]
hist, edges = np.histogram(frag_lengths, bins=300, range=(0, 1500))
peaks, _ = find_peaks(hist, distance=50, prominence=hist.max() * 0.05)
peak_positions = edges[peaks] + (edges[1] - edges[0]) / 2
# Mono peak should be ~NRL; di peak ~2*NRL
mono = peak_positions[(peak_positions > 150) & (peak_positions < 250)][0]
print(f'Estimated NRL: {mono:.0f} bp')
```
NRL inferred this way is approximate; for precision use autocorrelation on cumulative cleavage coverage instead.
## V-Plot in Python
```python
import numpy as np, pysam, matplotlib.pyplot as plt
def vplot(bam_path, regions_bed, max_size=600, flank=1000):
bam = pysam.AlignmentFile(bam_path, 'rb')
grid = np.zeros((max_size, 2 * flank))
for line in open(regions_bed):
chrom, start, *_ = line.strip().split('\t')
center = int(start)
for r in bam.fetch(chrom, max(0, center - flank), center + flank):
if not r.is_proper_pair or not r.is_read1: continue
size = abs(r.template_length)
if size <= 0 or size >= max_size: continue
frag_center = r.reference_start + size // 2
x = frag_center - center + flank
if 0 <= x < 2 * flank:
grid[size, x] += 1
return grid
g = vplot('sample.bam', 'tss.bed')
plt.imshow(g, aspect='auto', origin='lower', cmap='magma',
extent=[-1000, 1000, 0, 600])
plt.xlabel('Distance from feature (bp)')
plt.ylabel('Fragment size (bp)')
plt.savefig('vplot.png', dpi=200, bbox_inches='tight')
```
V-plot quality is the most useful diagnostic before nucleosome calling. Classic V at TSS = positioning info recoverable; flat band = not.
## Differential Nucleosome Positioning (DANPOS3 dpeak)
```bash
# Compare control vs treatment nucleosome positions
danpos3 dpeak \
-b condition2.bam:condition1.bam \
-c control.bam \
-o danpos_diff/ \
--paired 1 \
--width 145 --smooth_width 80
```
DANPOS reports four event types: shifted nucleosomes, gained, lost, fuzziness change. ENCODE has no official threshold; require >= 30 bp shift and FDR < 0.05 for nucleosome shift calls.
**Full ATAC-tuned DANPOS3 recipe:**
```bash
# --width 145: nucleosome footprint width
# --smooth_width 80: ATAC-specific (MNase default 60 too narrow)
# -jd 145: min distance between adjacent nuc calls (single-dash short flag)
# --pheight 0.95: peak height fraction for calling
# --frsz 200: fragment size used (mono-nuc)
danpos3 dpos sample.bam \
--paired 1 \
--width 145 \
--smooth_width 80 \
-jd 145 \
--pheight 0.95 \
--frsz 200 \
--out danpos_out/
```
Verify exact flags with `danpos3 dpos --help`; DANPOS3 documentation has been spotty and flag names can drift across releases.
Adapted from DANPOS3 docs for ATAC; `--smooth_width 80` widens the smoothing kernel to match ATAC's sharper signal vs MNase's broader cleavage. `-jd 145` (single-dash short, alternative `--distance 145`) enforces nucleosome spacing >= 145 bp (one nucleosome footprint).
## Histone Variant Detection from Fragment Size
**Trigger:** Suspected H2A.Z- or H3.3-containing nucleosomes; differential nucleosome composition between conditions.
**Mechanism:** H2A.Z replacement of H2A produces nucleosomes with weaker DNA-histone interaction (lower thermal stability); fragment-size shifts ~10-15 bp shorter than canonical H2A nucleosomes (Voong 2017 supplementary). H3.3 replacement is more subtle but H3.3-H2A.Z double-variant nucleosomes are particularly destabilized at active promoters.
**Detection:** Aggregate fragment-size distribution at H2A.Z ChIP-seq peaks vs H3K4me3-only peaks; the H2A.Z population shows mean fragment size ~10 bp shorter. ATAC alone CANNOT definitively call H2A.Z; H2A.Z ChIP-seq is needed for ground truth. ATAC fragment-size analysis is a hypothesis generator.
```python
# Per-region fragment-size mean as H2A.Z indicator
def region_frag_size(bam, region):
sizes = [abs(r.template_length) for r in bam.fetch(*region)
if r.is_proper_pair and r.is_read1 and 100 < abs(r.template_length) < 300]
return np.mean(sizes) if sizes else np.nan
# Compare H2A.Z-positive vs H2A.Z-negative TSSs
```
## Long-Read Single-Molecule Chromatin (Fiber-seq, NanoNOMe)
**Alternative to short-read ATAC for nucleosome positioning:**
| Method | Tech | Resolution | Strength |
|--------|------|-----------|----------|
| Fiber-seq (Stergachis 2020) | PacBio HiFi + DNA methylation footprinting | Per-molecule single-bp | Reads continuous chromatin fiber up to 20 kb; resolves haplotype-specific positioning |
| NanoNOMe (Lee 2020) | Nanopore + GpC methyltransferase | Per-molecule single-bp | Same single-molecule but cheaper than PacBio |
| MOSE (Dong 2024) | Nanopore + methylation | Improved Nanopore | Newer; benchmarks emerging |
Fiber-seq can detect nucleosome occupancy directly per single chromatin molecule (no aggregation needed). Resolves cell-cycle-dependent and stochastic positioning that bulk ATAC averages out. Preferred for fine-structure analysis of regulatory elements.
For most labs, short-read ATAC + NucleoATAC remains primary; Fiber-seq is special-purpose when single-molecule resolution is essential.
## Nucleosome Fuzziness
Fuzziness measures how sharply positioned a nucleosome is across cells. Defined as the standard deviation of per-cell nucleosome center positions.
| Fuzziness range | Interpretation |
|-----------------|----------------|
| < 20 bp | Sharply positioned (rare in metazoa; common at +1 in yeast) |
| 20-50 bp | Standard well-positioned |
| 50-100 bp | Fuzzy; constitutive but non-stable |
| > 100 bp | Effectively unpositioned |
NucleoATAC reports per-nucleosome fuzziness as `nuc_size`; values around the NRL are expected.
## Common Errors
| Error / symptom | Cause | Solution |
|-----------------|-------|----------|
| `nucleoatac run` ImportError on numpy | Python 3.6 incompatibility | Use dedicated conda env with pinned versions |
| Empty .nucpos.bed output | Region BED too short or library too shallow | Verify region size >= 500 bp; depth >= 30M |
| V-plot shows horizontal band, no V | No positioning info; library over-transposed or wrong feature center | Check feature BED; verify TSS positions are correct |
| Mono-nuc count very low | Wrong fragment-size window (used 100-180 instead of 180-247) | Use Buenrostro windows |
| factorFootprints asymmetric | Pioneer TF; this is biological | Treat as signal, not artefact |
| DANPOS calls many shifts | MNase parameters used on ATAC | Tune `--smooth_width 80 --width 145` for ATAC |
| splitGAlignmentsByCut error in ATACseqQC | BAM is single-end | Mono-nuc analysis requires paired-end |
| +1 nucleosome not visible at TSS aggregate | TSS list mixes coding + non-coding strands; or wrong genome build | Restrict to protein-coding TSSs in matched build |
## References
- Schep AN et al 2015 Genome Res 25:1757 (NucleoATAC)
- Chen K et al 2013 Genome Res 23:341 (DANPOS)
- Buenrostro JD et al 2013 Nat Methods 10:1213 (ATAC fragment-size classes)
- Ou J et al 2018 BMC Genomics 19:169 (ATACseqQC)
- Bao Y et al 2024 bioRxiv (scprinter)
- Mavrich TN et al 2008 Nature 453:358 (+1 nucleosome positioning)
- Voong LN et al 2017 Cell 169:1422 (high-resolution nucleosome mapping)
- Teif VB et al 2012 Nat Struct Mol Biol 19:1185 (NRL variation across cell types)
## Related Skills
- atac-seq/atac-qc - Fragment-size periodicity QC
- atac-seq/atac-peak-calling - Nucleosome-aware MACS3 hmmratac
- atac-seq/footprinting - Per-TF flanking nucleosome analysis
- atac-seq/single-cell-atac - scprinter for sc nucleosome positioning
- chip-seq/peak-annotation - Annotate nucleosome positions to genes
- alignment-files/bam-statistics - Insert-size statistics upstream
More from GPTomics/bioSkills
- bio-admet-predictionPredicts ADMET properties using ADMETlab 3.0 API or DeepChem models. Estimates bioavailability, CYP inhibition, hERG liability, and 119 toxicity endpoints with uncertainty quantification. Filters for PAINS and other structural alerts. Use when filtering compounds for drug-likeness or prioritizing leads by predicted safety.
- bio-alignment-amplicon-clippingTrim PCR primers from aligned reads in amplicon-panel BAMs using samtools ampliconclip. Use when processing SARS-CoV-2 ARTIC, hereditary cancer panels, ctDNA hot-spot panels, or any amplicon assay where primer-derived bases would falsely confirm reference at primer footprints.
- bio-alignment-filteringFilter alignments by flags, mapping quality, and regions using samtools view and pysam. Use when extracting specific reads, removing low-quality alignments, or subsetting to target regions.
- bio-alignment-indexingCreate and use BAI/CSI indices for BAM/CRAM files using samtools and pysam. Use when enabling random access to alignment files or fetching specific genomic regions.
- bio-alignment-ioRead, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
- bio-alignment-msa-parsingParse and analyze multiple sequence alignments using Biopython. Extract sequences, identify conserved regions, analyze gaps, work with annotations, and manipulate alignment data for downstream analysis. Use when parsing or manipulating multiple sequence alignments.
- bio-alignment-msa-statisticsCalculate alignment statistics including sequence identity, conservation scores, substitution matrices, and similarity metrics. Use when comparing alignment quality, measuring sequence divergence, and analyzing evolutionary patterns.
- bio-alignment-multiplePerform multiple sequence alignment using MAFFT, MUSCLE5, ClustalOmega, or T-Coffee. Guides tool and algorithm selection based on dataset size, sequence divergence, and downstream application. Use when aligning three or more homologous sequences for phylogenetics, conservation analysis, or evolutionary studies.
- bio-alignment-pairwisePerform pairwise sequence alignment using Biopython Bio.Align.PairwiseAligner. Use when comparing two sequences, finding optimal alignments, scoring similarity, and identifying local or global matches between DNA, RNA, or protein sequences.
- bio-alignment-sortingSort alignment files by coordinate or read name using samtools and pysam. Use when preparing BAM files for indexing, variant calling, or paired-end analysis.