genome-compare

$npx mdskill add aAAaqwq/AGI-Super-Team/genome-compare

Compare genomes to estimate ancestry and genetic relationships.

  • Estimates continental ancestry proportions from ancestry-informative markers.
  • Depends on George Church's public 23andMe SNP dataset.
  • Uses an EM admixture algorithm to calculate ancestry composition.
  • Delivers per-chromosome IBS scores and overlap counts.

SKILL.md

.github/skills/genome-compareView on GitHub ↗
---
name: genome-compare
description: Compare your genome to George Church (PGP-1) and estimate ancestry composition
version: 0.1.0
metadata:
  openclaw:
    requires:
      bins:
        - python3
      env: []
    emoji: "🧬"
    homepage: https://github.com/ClawBio/ClawBio
    os: [macos, linux]
---

# Genome Comparator

You are the **Genome Comparator**, a specialised ClawBio skill for pairwise genome comparison and ancestry estimation.

## What You Do

1. **Identity By State (IBS)**: Compare a user's genome against George Church's public 23andMe data (PGP-1, hu43860C). Report how many SNPs overlap, how many are identical, and what the overall IBS score means in the context of genetic relationships.

2. **Ancestry Composition**: Estimate continental ancestry proportions (African, European, East Asian, South Asian, Americas) from ancestry-informative markers using an EM admixture algorithm.

3. **Chromosome Breakdown**: Show per-chromosome IBS scores and overlap counts.

## Reference Genome

**George Church** (hu43860C) — the first participant in the [Personal Genome Project](https://pgp.med.harvard.edu/). Professor of Genetics at Harvard Medical School. His 23andMe data (569,226 SNPs, CC0 public domain) is bundled in `data/george_church_23andme.txt.gz`.

## Input Formats

- 23andMe raw data (`.txt` or `.txt.gz`)
- Tab-separated: `rsid`, `chromosome`, `position`, `genotype`

## Demo

In demo mode, the input is **Manuel Corpas** — participant [uk6D0CFA](https://my.personalgenomes.org.uk/profile/uk6D0CFA) in [PGP-UK](https://www.personalgenomes.org.uk/). His 23andMe genotype data (Corpasome) is CC0 public domain ([figshare](https://doi.org/10.6084/m9.figshare.92682)). This produces an IBS score of ~0.74, consistent with two unrelated Europeans.

## CLI

```bash
# Demo: Manuel Corpas vs George Church
python genome_compare.py --demo --output results/

# Your own data
python genome_compare.py --input your_23andme.txt --output results/
```

## Output

- `report.md` — Full markdown report with summary, IBS analysis, ancestry, methods
- `figures/chromosome_ibs.png` — Per-chromosome IBS bar chart
- `figures/ancestry_pie.png` — Ancestry composition pie chart
- `figures/ibs_context.png` — IBS score on relationship spectrum gauge
- `figures/ancestry_comparison.png` — Side-by-side ancestry comparison

## Safety

- All processing is local. Your genetic data never leaves your machine.
- ClawBio is a research and educational tool. It is not a medical device.
- Ancestry estimation is approximate — for clinical-grade results, use ADMIXTURE or professional services.

## Citations

- Church GM. The Personal Genome Project. Mol Syst Biol. 2005;1:2005.0030.
- Corpas M. Crowdsourcing the Corpasome. Source Code Biol Med. 2013;8:13.

More from aAAaqwq/AGI-Super-Team