bio-hi-c-analysis-hic-data-io
$
npx mdskill add GPTomics/bioSkills/bio-hi-c-analysis-hic-data-ioProcess Hi-C contact matrices in cooler format efficiently.
- Converts .hic files to .cool format for analysis workflows.
- Depends on cooler, numpy, pandas, and scanpy libraries.
- Executes format conversion and matrix loading via CLI or Python.
- Outputs parsed data structures and exportable matrix subsets.
SKILL.md
.github/skills/bio-hi-c-analysis-hic-data-ioView on GitHub ↗
---
name: bio-hi-c-analysis-hic-data-io
description: Load, convert, and manipulate Hi-C contact matrices using cooler format. Read .cool/.mcool files, convert from .hic format, access matrix data, and export to different formats. Use when loading or converting Hi-C contact matrices.
tool_type: mixed
primary_tool: cooler
---
## Version Compatibility
Reference examples tested with: cooler 0.9+, numpy 1.26+, pandas 2.2+, scanpy 1.10+, scipy 1.12+
Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
- CLI: `<tool> --version` then `<tool> --help` to confirm flags
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
# Hi-C Data I/O
**"Load my Hi-C contact matrix"** → Read .cool/.mcool/.hic files into Python, access contact pixels, convert between formats, and export subsets.
- Python: `cooler.Cooler('file.mcool::resolutions/10000')`
- CLI: `cooler load`, `hic2cool convert`
Load and manipulate Hi-C contact matrices in cooler format.
## Required Imports
```python
import cooler
import numpy as np
import pandas as pd
```
## Load a Cooler File
```python
# Load a .cool file
clr = cooler.Cooler('matrix.cool')
# Basic info
print(f'Chromosomes: {clr.chromnames}')
print(f'Bin size: {clr.binsize}')
print(f'Number of bins: {clr.info["nbins"]}')
print(f'Sum of counts: {clr.info["sum"]}')
```
## Load Multi-Resolution Cooler (.mcool)
```python
# List available resolutions
resolutions = cooler.fileops.list_coolers('matrix.mcool')
print(f'Available resolutions: {resolutions}')
# Load specific resolution
clr = cooler.Cooler('matrix.mcool::resolutions/10000')
print(f'Loaded at {clr.binsize}bp resolution')
```
## Access Bin Information
```python
# Get bin table (genomic coordinates)
bins = clr.bins()[:]
print(bins.head())
# Columns: chrom, start, end, weight (if balanced)
# Get bins for a chromosome
chr1_bins = clr.bins().fetch('chr1')
print(f'chr1 has {len(chr1_bins)} bins')
```
## Access Pixel (Contact) Information
```python
# Get all contacts as DataFrame
pixels = clr.pixels()[:]
print(pixels.head())
# Columns: bin1_id, bin2_id, count
# Get contacts for a region
region_pixels = clr.pixels().fetch('chr1:0-10000000')
```
## Extract Contact Matrix
```python
# Get matrix for a chromosome
matrix = clr.matrix(balance=True).fetch('chr1')
print(f'Matrix shape: {matrix.shape}')
# Get matrix for a region
region_matrix = clr.matrix(balance=True).fetch('chr1:50000000-60000000')
# Get raw (unbalanced) matrix
raw_matrix = clr.matrix(balance=False).fetch('chr1')
# Sparse matrix for memory efficiency
from scipy import sparse
sparse_matrix = clr.matrix(balance=True, sparse=True).fetch('chr1')
```
## Extract Submatrix (Two Regions)
```python
# Get contacts between two regions
region1 = 'chr1:50000000-60000000'
region2 = 'chr1:70000000-80000000'
submatrix = clr.matrix(balance=True).fetch(region1, region2)
print(f'Submatrix shape: {submatrix.shape}')
# Inter-chromosomal contacts
inter_matrix = clr.matrix(balance=True).fetch('chr1', 'chr2')
```
## Convert from .hic to Cooler
```bash
# Using hic2cool CLI
hic2cool convert input.hic output.mcool -r 0 # All resolutions
# Specific resolution
hic2cool convert input.hic output.cool -r 10000
```
```python
# Python alternative using hic2cool
import hic2cool
hic2cool.hic2cool_convert('input.hic', 'output.mcool', resolution=0)
```
## Convert from Text Formats
```python
# From pairs file to cooler
# First create bins
import bioframe
chromsizes = bioframe.fetch_chromsizes('hg38')
bins = cooler.binnify(chromsizes, binsize=10000)
# Then aggregate pairs
cooler.create_cooler(
'output.cool',
bins,
pixels=None, # Will be loaded from pairs
dtypes={'count': int},
)
# Or use cooler cload
# cooler cload pairs -c1 2 -p1 3 -c2 4 -p2 5 chromsizes.txt:10000 pairs.txt output.cool
```
## Create Cooler from Matrix
**Goal:** Convert an in-memory numpy contact matrix into a cooler file for use with cooltools and other Hi-C analysis tools.
**Approach:** Define genomic bins from chromosome sizes, convert the upper-triangle matrix entries into a pixel DataFrame of (bin1_id, bin2_id, count) tuples, and write to a new cooler file.
```python
import cooler
import numpy as np
import bioframe
# Create bins
chromsizes = bioframe.fetch_chromsizes('hg38')
bins = cooler.binnify(chromsizes, binsize=10000)
# Create pixel dataframe from matrix
n_bins = len(bins)
# matrix = np.random.poisson(1, (n_bins, n_bins)) # Your matrix here
# matrix = np.triu(matrix) # Upper triangle
# Convert to pixels
pixels = []
for i in range(n_bins):
for j in range(i, n_bins):
if matrix[i, j] > 0:
pixels.append({'bin1_id': i, 'bin2_id': j, 'count': matrix[i, j]})
pixels_df = pd.DataFrame(pixels)
# Create cooler
cooler.create_cooler('new.cool', bins, pixels_df)
```
## Merge Cooler Files
```python
# Merge multiple cooler files
cooler.merge_coolers('merged.cool', ['sample1.cool', 'sample2.cool'])
```
## Coarsen Resolution
```python
# Create lower resolution from high resolution
cooler.coarsen_cooler('hires.cool', 'lowres.cool', factor=10) # 10x coarser
# Or using zoomify for multiple resolutions
cooler.zoomify_cooler('input.cool', 'output.mcool', resolutions=[10000, 50000, 100000, 500000])
```
## Export to Other Formats
```python
# Export matrix to numpy
matrix = clr.matrix(balance=True).fetch('chr1')
np.save('chr1_matrix.npy', matrix)
# Export to text
np.savetxt('chr1_matrix.txt', matrix, delimiter='\t')
# Export pixels to CSV
pixels = clr.pixels()[:]
pixels.to_csv('pixels.csv', index=False)
```
## Dump to Pairs Format
```bash
# Using cooler dump
cooler dump -t pixels --join matrix.cool > pairs.txt
# Dump bins
cooler dump -t bins matrix.cool > bins.txt
```
## Access Metadata
```python
# Get all metadata
print(clr.info)
# Specific metadata
print(f'Genome assembly: {clr.info.get("genome-assembly", "Unknown")}')
print(f'Creation date: {clr.info.get("creation-date", "Unknown")}')
# Check if balanced
if 'weight' in clr.bins().columns:
print('Matrix has balancing weights')
```
## List Cooler Contents
```python
# For mcool
coolers = cooler.fileops.list_coolers('multi.mcool')
print(f'Available: {coolers}')
# Check if valid cooler
is_valid = cooler.fileops.is_cooler('file.cool')
print(f'Valid cooler: {is_valid}')
```
## Related Skills
- matrix-operations - Balance and normalize matrices
- hic-visualization - Visualize contact matrices
- contact-pairs - Process raw Hi-C pairs
More from GPTomics/bioSkills
- bio-admet-predictionPredicts ADMET properties using ADMETlab 3.0 API or DeepChem models. Estimates bioavailability, CYP inhibition, hERG liability, and 119 toxicity endpoints with uncertainty quantification. Filters for PAINS and other structural alerts. Use when filtering compounds for drug-likeness or prioritizing leads by predicted safety.
- bio-alignment-amplicon-clippingTrim PCR primers from aligned reads in amplicon-panel BAMs using samtools ampliconclip. Use when processing SARS-CoV-2 ARTIC, hereditary cancer panels, ctDNA hot-spot panels, or any amplicon assay where primer-derived bases would falsely confirm reference at primer footprints.
- bio-alignment-filteringFilter alignments by flags, mapping quality, and regions using samtools view and pysam. Use when extracting specific reads, removing low-quality alignments, or subsetting to target regions.
- bio-alignment-indexingCreate and use BAI/CSI indices for BAM/CRAM files using samtools and pysam. Use when enabling random access to alignment files or fetching specific genomic regions.
- bio-alignment-ioRead, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
- bio-alignment-msa-parsingParse and analyze multiple sequence alignments using Biopython. Extract sequences, identify conserved regions, analyze gaps, work with annotations, and manipulate alignment data for downstream analysis. Use when parsing or manipulating multiple sequence alignments.
- bio-alignment-msa-statisticsCalculate alignment statistics including sequence identity, conservation scores, substitution matrices, and similarity metrics. Use when comparing alignment quality, measuring sequence divergence, and analyzing evolutionary patterns.
- bio-alignment-multiplePerform multiple sequence alignment using MAFFT, MUSCLE5, ClustalOmega, or T-Coffee. Guides tool and algorithm selection based on dataset size, sequence divergence, and downstream application. Use when aligning three or more homologous sequences for phylogenetics, conservation analysis, or evolutionary studies.
- bio-alignment-pairwisePerform pairwise sequence alignment using Biopython Bio.Align.PairwiseAligner. Use when comparing two sequences, finding optimal alignments, scoring similarity, and identifying local or global matches between DNA, RNA, or protein sequences.
- bio-alignment-sortingSort alignment files by coordinate or read name using samtools and pysam. Use when preparing BAM files for indexing, variant calling, or paired-end analysis.