bio-imaging-mass-cytometry-quality-metrics
$
npx mdskill add GPTomics/bioSkills/bio-imaging-mass-cytometry-quality-metricsEvaluates IMC data quality using signal-to-noise, correlation, and tissue integrity metrics
- Assesses data quality for mass cytometry imaging before analysis
- Uses numpy, scipy, and skimage for image processing and calculations
- Analyzes signal-to-noise ratios and channel correlations across images
- Returns QC metrics to guide data validation or acquisition troubleshooting
SKILL.md
.github/skills/bio-imaging-mass-cytometry-quality-metricsView on GitHub ↗
---
name: bio-imaging-mass-cytometry-quality-metrics
description: Quality metrics for IMC data including signal-to-noise, channel correlation, tissue integrity, and acquisition QC. Use when assessing data quality before analysis or troubleshooting problematic acquisitions.
tool_type: python
primary_tool: numpy
---
## Version Compatibility
Reference examples tested with: matplotlib 3.8+, numpy 1.26+, pandas 2.2+, scipy 1.12+
Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
# Quality Metrics
**"Assess quality of my IMC acquisition"** → Evaluate IMC data quality through signal-to-noise ratios, channel correlations, tissue integrity scores, and acquisition-specific QC metrics.
- Python: `numpy`/`scipy` for SNR calculation and channel correlation analysis
## Signal-to-Noise Ratio
```python
import numpy as np
from scipy import ndimage
from skimage import io
def calculate_snr(image, mask=None):
'''Calculate signal-to-noise ratio for an image channel.'''
if mask is None:
mask = image > np.percentile(image, 10)
signal = np.mean(image[mask])
noise = np.std(image[~mask])
if noise == 0:
return np.inf
snr = signal / noise
return snr
def calculate_snr_all_channels(image_stack, channel_names, tissue_mask=None):
'''Calculate SNR for all channels in stack.'''
results = {}
for i, name in enumerate(channel_names):
snr = calculate_snr(image_stack[i], tissue_mask)
results[name] = snr
return results
image_stack = io.imread('imc_image.tiff')
channel_names = ['CD45', 'CD3', 'CD68', 'panCK', 'DNA']
snr_values = calculate_snr_all_channels(image_stack, channel_names)
for ch, snr in snr_values.items():
status = 'PASS' if snr > 3 else 'WARN' if snr > 1.5 else 'FAIL'
print(f'{ch}: SNR = {snr:.2f} [{status}]')
```
## Channel Correlation
```python
def calculate_channel_correlation(image_stack, channel_names):
'''Calculate pairwise correlation between channels.'''
n_channels = image_stack.shape[0]
flat_data = image_stack.reshape(n_channels, -1)
corr_matrix = np.corrcoef(flat_data)
import pandas as pd
corr_df = pd.DataFrame(corr_matrix, index=channel_names, columns=channel_names)
return corr_df
def flag_unexpected_correlations(corr_df, expected_pairs=None, threshold=0.7):
'''Flag unexpected high correlations (possible spillover).'''
issues = []
if expected_pairs is None:
expected_pairs = []
for i, ch1 in enumerate(corr_df.columns):
for j, ch2 in enumerate(corr_df.columns):
if i >= j:
continue
corr = corr_df.loc[ch1, ch2]
pair = (ch1, ch2)
is_expected = pair in expected_pairs or (ch2, ch1) in expected_pairs
if corr > threshold and not is_expected:
issues.append({'channel_1': ch1, 'channel_2': ch2, 'correlation': corr, 'expected': is_expected})
return pd.DataFrame(issues)
corr_matrix = calculate_channel_correlation(image_stack, channel_names)
print('Channel correlations:')
print(corr_matrix.round(2))
expected = [('CD3', 'CD45')]
issues = flag_unexpected_correlations(corr_matrix, expected)
if len(issues) > 0:
print('\nUnexpected high correlations:')
print(issues)
```
## Tissue Integrity
```python
def assess_tissue_integrity(dna_channel, min_coverage=0.3):
'''Assess tissue coverage and integrity from DNA channel.'''
threshold = np.percentile(dna_channel, 50)
tissue_mask = dna_channel > threshold
total_pixels = dna_channel.size
tissue_pixels = np.sum(tissue_mask)
coverage = tissue_pixels / total_pixels
labeled, n_fragments = ndimage.label(tissue_mask)
fragment_sizes = ndimage.sum(tissue_mask, labeled, range(1, n_fragments + 1))
largest_fragment = np.max(fragment_sizes) if len(fragment_sizes) > 0 else 0
fragmentation = 1 - (largest_fragment / tissue_pixels) if tissue_pixels > 0 else 1
return {
'coverage': coverage,
'n_fragments': n_fragments,
'fragmentation': fragmentation,
'intact': coverage > min_coverage and fragmentation < 0.5
}
dna_channel = image_stack[channel_names.index('DNA')]
integrity = assess_tissue_integrity(dna_channel)
print(f"Tissue coverage: {integrity['coverage']:.1%}")
print(f"Fragments: {integrity['n_fragments']}")
print(f"Fragmentation: {integrity['fragmentation']:.2f}")
print(f"Status: {'PASS' if integrity['intact'] else 'FAIL'}")
```
## Acquisition QC
```python
def check_acquisition_artifacts(image_stack, channel_names):
'''Check for common acquisition artifacts.'''
results = []
for i, name in enumerate(channel_names):
channel = image_stack[i]
saturated = np.sum(channel >= channel.max() * 0.99) / channel.size
if saturated > 0.01:
results.append({'channel': name, 'issue': 'saturation', 'severity': saturated})
hot_pixels = np.sum(channel > np.percentile(channel, 99.9) * 2) / channel.size
if hot_pixels > 0.001:
results.append({'channel': name, 'issue': 'hot_pixels', 'severity': hot_pixels})
dead_regions = np.sum(channel == 0) / channel.size
if dead_regions > 0.05:
results.append({'channel': name, 'issue': 'dead_regions', 'severity': dead_regions})
row_means = np.mean(channel, axis=1)
row_cv = np.std(row_means) / np.mean(row_means)
if row_cv > 0.3:
results.append({'channel': name, 'issue': 'striping', 'severity': row_cv})
return pd.DataFrame(results)
artifacts = check_acquisition_artifacts(image_stack, channel_names)
if len(artifacts) > 0:
print('Artifacts detected:')
print(artifacts)
else:
print('No major artifacts detected')
```
## Dynamic Range
```python
def assess_dynamic_range(channel, percentiles=(1, 99)):
'''Assess if channel uses full dynamic range.'''
low, high = np.percentile(channel, percentiles)
channel_range = high - low
max_possible = channel.max()
utilized = channel_range / max_possible if max_possible > 0 else 0
return {
'range_low': low,
'range_high': high,
'range_utilized': utilized,
'adequate': utilized > 0.1
}
for i, name in enumerate(channel_names):
dr = assess_dynamic_range(image_stack[i])
status = 'OK' if dr['adequate'] else 'LOW'
print(f"{name}: {dr['range_utilized']:.1%} range used [{status}]")
```
## Segmentation Quality Metrics
```python
def segmentation_qc(segmentation_mask, image_stack, channel_names):
'''QC metrics for cell segmentation.'''
from skimage.measure import regionprops
props = regionprops(segmentation_mask)
n_cells = len(props)
if n_cells == 0:
return {'error': 'No cells found'}
areas = [p.area for p in props]
eccentricities = [p.eccentricity for p in props]
area_cv = np.std(areas) / np.mean(areas)
very_small = np.sum(np.array(areas) < np.percentile(areas, 5)) / n_cells
very_large = np.sum(np.array(areas) > np.percentile(areas, 95)) / n_cells
elongated = np.sum(np.array(eccentricities) > 0.9) / n_cells
return {
'n_cells': n_cells,
'mean_area': np.mean(areas),
'area_cv': area_cv,
'pct_very_small': very_small,
'pct_very_large': very_large,
'pct_elongated': elongated,
'quality': 'GOOD' if area_cv < 0.5 and elongated < 0.1 else 'REVIEW'
}
seg_mask = io.imread('cell_segmentation.tiff')
seg_qc = segmentation_qc(seg_mask, image_stack, channel_names)
print(f"Cells: {seg_qc['n_cells']}")
print(f"Mean area: {seg_qc['mean_area']:.1f} pixels")
print(f"Quality: {seg_qc['quality']}")
```
## Batch QC Summary
**Goal:** Generate a consolidated quality report across all acquisitions in a batch to identify samples requiring re-acquisition or exclusion.
**Approach:** For each image, compute SNR, tissue integrity, segmentation metrics, and artifact counts, then aggregate into a summary table with pass/fail calls based on combined threshold criteria.
```python
def batch_qc_report(image_files, seg_files, channel_names, output_file):
'''Generate QC report for batch of images.'''
all_results = []
for img_file, seg_file in zip(image_files, seg_files):
image_stack = io.imread(img_file)
seg_mask = io.imread(seg_file)
result = {'sample': Path(img_file).stem}
snr_values = calculate_snr_all_channels(image_stack, channel_names)
result['mean_snr'] = np.mean(list(snr_values.values()))
result['min_snr'] = min(snr_values.values())
dna_idx = channel_names.index('DNA') if 'DNA' in channel_names else 0
integrity = assess_tissue_integrity(image_stack[dna_idx])
result['tissue_coverage'] = integrity['coverage']
seg_qc = segmentation_qc(seg_mask, image_stack, channel_names)
result['n_cells'] = seg_qc.get('n_cells', 0)
artifacts = check_acquisition_artifacts(image_stack, channel_names)
result['n_artifacts'] = len(artifacts)
result['pass_qc'] = (result['min_snr'] > 1.5 and result['tissue_coverage'] > 0.3 and result['n_artifacts'] == 0)
all_results.append(result)
results_df = pd.DataFrame(all_results)
results_df.to_csv(output_file, index=False)
print(f"QC Summary: {results_df['pass_qc'].sum()}/{len(results_df)} samples passed")
return results_df
```
## Visualization
```python
import matplotlib.pyplot as plt
def plot_qc_summary(image_stack, channel_names, output_file):
'''Generate QC summary visualization.'''
n_channels = len(channel_names)
fig, axes = plt.subplots(2, n_channels, figsize=(3*n_channels, 6))
for i, name in enumerate(channel_names):
channel = image_stack[i]
axes[0, i].imshow(channel, cmap='viridis')
axes[0, i].set_title(name)
axes[0, i].axis('off')
axes[1, i].hist(channel.flatten(), bins=100, log=True)
axes[1, i].set_xlabel('Intensity')
axes[1, i].set_ylabel('Count')
plt.tight_layout()
plt.savefig(output_file, dpi=150)
plt.close()
plot_qc_summary(image_stack, channel_names, 'qc_summary.png')
```
## Related Skills
- data-preprocessing - Clean data before QC
- cell-segmentation - Segmentation affects QC metrics
- interactive-annotation - Manual review of QC failures
- phenotyping - Analysis after QC passes
More from GPTomics/bioSkills
- bio-admet-predictionPredicts ADMET properties using ADMETlab 3.0 API or DeepChem models. Estimates bioavailability, CYP inhibition, hERG liability, and 119 toxicity endpoints with uncertainty quantification. Filters for PAINS and other structural alerts. Use when filtering compounds for drug-likeness or prioritizing leads by predicted safety.
- bio-alignment-amplicon-clippingTrim PCR primers from aligned reads in amplicon-panel BAMs using samtools ampliconclip. Use when processing SARS-CoV-2 ARTIC, hereditary cancer panels, ctDNA hot-spot panels, or any amplicon assay where primer-derived bases would falsely confirm reference at primer footprints.
- bio-alignment-filteringFilter alignments by flags, mapping quality, and regions using samtools view and pysam. Use when extracting specific reads, removing low-quality alignments, or subsetting to target regions.
- bio-alignment-indexingCreate and use BAI/CSI indices for BAM/CRAM files using samtools and pysam. Use when enabling random access to alignment files or fetching specific genomic regions.
- bio-alignment-ioRead, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
- bio-alignment-msa-parsingParse and analyze multiple sequence alignments using Biopython. Extract sequences, identify conserved regions, analyze gaps, work with annotations, and manipulate alignment data for downstream analysis. Use when parsing or manipulating multiple sequence alignments.
- bio-alignment-msa-statisticsCalculate alignment statistics including sequence identity, conservation scores, substitution matrices, and similarity metrics. Use when comparing alignment quality, measuring sequence divergence, and analyzing evolutionary patterns.
- bio-alignment-multiplePerform multiple sequence alignment using MAFFT, MUSCLE5, ClustalOmega, or T-Coffee. Guides tool and algorithm selection based on dataset size, sequence divergence, and downstream application. Use when aligning three or more homologous sequences for phylogenetics, conservation analysis, or evolutionary studies.
- bio-alignment-pairwisePerform pairwise sequence alignment using Biopython Bio.Align.PairwiseAligner. Use when comparing two sequences, finding optimal alignments, scoring similarity, and identifying local or global matches between DNA, RNA, or protein sequences.
- bio-alignment-sortingSort alignment files by coordinate or read name using samtools and pysam. Use when preparing BAM files for indexing, variant calling, or paired-end analysis.