cosmic-database
$
npx mdskill add aipoch/medical-research-skills/cosmic-databaseDownload curated mutation datasets and query cancer gene census.
- Enables reproducible genomic analysis with packaged mutation resources.
- Integrates with COSMIC database and Cancer Gene Census APIs.
- Executes scripts/download_cosmic.py for structured data retrieval.
- Delivers consistent file-based results for validation steps.
SKILL.md
.github/skills/cosmic-databaseView on GitHub ↗
---
name: cosmic-database
description: Access COSMIC to download mutation datasets, query Cancer Gene Census, and retrieve mutational signatures when your genomic analysis requires curated somatic mutation resources.
license: MIT
author: aipoch
---
> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)
# COSMIC Database Skill
## When to Use
- Use this skill when you need access cosmic to download mutation datasets, query cancer gene census, and retrieve mutational signatures when your genomic analysis requires curated somatic mutation resources in a reproducible workflow.
- Use this skill when a evidence insight task needs a packaged method instead of ad-hoc freeform output.
- Use this skill when the user expects a concrete deliverable, validation step, or file-based result.
- Use this skill when `scripts/download_cosmic.py` is the most direct path to complete the request.
- Use this skill when you need the `cosmic-database` package behavior rather than a generic answer.
## Key Features
- Scope-focused workflow aligned to: Access COSMIC to download mutation datasets, query Cancer Gene Census, and retrieve mutational signatures when your genomic analysis requires curated somatic mutation resources.
- Packaged executable path(s): `scripts/download_cosmic.py`.
- Reference material available in `references/` for task-specific guidance.
- Structured execution path designed to keep outputs consistent and reviewable.
## Dependencies
- `Python`: `3.10+`. Repository baseline for current packaged skills.
- `Third-party packages`: `not explicitly version-pinned in this skill package`. Add pinned versions if this skill needs stricter environment control.
## Example Usage
```bash
cd "20260316/scientific-skills/Evidence Insight/cosmic-database"
python -m py_compile scripts/download_cosmic.py
python scripts/download_cosmic.py --help
```
Example run plan:
1. Confirm the user input, output path, and any required config values.
2. Edit the in-file `CONFIG` block or documented parameters if the script uses fixed settings.
3. Run `python scripts/download_cosmic.py` with the validated inputs.
4. Review the generated output and return the final artifact with any assumptions called out.
## Implementation Details
- Execution model: validate the request, choose the packaged workflow, and produce a bounded deliverable.
- Input controls: confirm the source files, scope limits, output format, and acceptance criteria before running any script.
- Primary implementation surface: `scripts/download_cosmic.py`.
- Reference guidance: `references/` contains supporting rules, prompts, or checklists.
- Parameters to clarify first: input path, output path, scope filters, thresholds, and any domain-specific constraints.
- Output discipline: keep results reproducible, identify assumptions explicitly, and avoid undocumented side effects.
## 1. When to Use
Use this skill when you need COSMIC data for tasks such as:
- Downloading COSMIC mutation exports (TSV/VCF) for cohort or sample-level variant analysis.
- Retrieving Cancer Gene Census (CGC) gene lists for oncogene/tumor suppressor annotation and prioritization.
- Working with COSMIC mutational signatures (SBS/DBS/ID) for signature attribution or comparative studies.
- Accessing additional COSMIC genomics datasets (e.g., copy number, fusions, expression) for multi-omics integration.
- Building reproducible pipelines that programmatically fetch the latest COSMIC releases.
## 2. Key Features
- **Authenticated downloads** of COSMIC files (e.g., TSV/VCF; often GZIP-compressed).
- **Cancer Gene Census access** for curated cancer gene information.
- **Mutational signature retrieval** including **SBS**, **DBS**, and **ID** signatures.
- **Support for multiple COSMIC dataset types**, such as mutation, copy number, fusion, and expression resources.
- **Pandas-friendly workflow** for loading and filtering downloaded tables.
## 3. Dependencies
- Python **3.9+**
- `pandas` **>= 1.5**
- `requests` **>= 2.28**
External requirements:
- A registered COSMIC account at https://cancer.sanger.ac.uk/cosmic
- Valid COSMIC login credentials (email + password)
## 4. Example Usage
The following example downloads a COSMIC file and loads it into a pandas DataFrame.
```python
from scripts.download_cosmic import download_cosmic_file
import pandas as pd
# 1) Download a COSMIC dataset (example path; adjust to your target release/build)
download_cosmic_file(
email="user@email.com",
password="pwd",
filepath="GRCh38/cosmic/latest/CosmicMutantExport.tsv.gz"
)
# 2) Load the downloaded GZIP-compressed TSV
df = pd.read_csv(
"CosmicMutantExport.tsv.gz",
sep="\t",
compression="gzip"
)
# 3) Example analysis: filter by gene symbol (column name depends on the dataset)
# df_gene = df[df["Gene name"] == "TP53"]
```
For dataset field definitions and COSMIC file specifics, see: `references/cosmic_data_reference.md`.
## 5. Implementation Details
- **Authentication**: Downloads require COSMIC account credentials (email/password) and are performed via an authenticated HTTP session.
- **File targeting**: The `filepath` parameter specifies the COSMIC resource path (e.g., genome build such as `GRCh38`, release channel such as `latest`, and the target filename).
- **Data format**: Many COSMIC exports are distributed as **GZIP-compressed TSV** (and sometimes **VCF**). Use `pandas.read_csv(..., sep="\t", compression="gzip")` for TSV `.gz` files.
- **Typical workflow**:
1. Download the desired COSMIC export.
2. Load into a DataFrame (or parse VCF with an appropriate library if needed).
3. Filter/aggregate by gene, tumor type, sample, or signature depending on the analysis goal.
More from aipoch/medical-research-skills
- 3d-molecule-ray-tracerGenerate photorealistic rendering scripts for PyMOL and UCSF ChimeraX.
- abstract-summarizerTransform lengthy academic papers into concise, structured 250-word abstracts.
- abstract-trimmerPrecision editing tool that reduces abstract word count through intelligent compression techniques, maintaining scientific rigor while meeting strict journal and conference requirements.
- academic-abstract-refinerRefines long medical academic texts into SCI-style unstructured Chinese and English abstracts; use when you need to condense drafts/reports/summaries into bilingual abstracts and generate Summary_Report.md.
- academic-cv-generatorGenerate structured academic CVs from free-form Chinese/English text and export to Word (.docx). Use this skill when you are asked to organize, generate, or optimize an academic CV (e.g., publications/projects/awards) into a consistent, formatted document with uniform-colored section headers and optional bilingual output.
- academic-highlight-generatorGenerates submission-ready Elsevier/SCI Highlights from manuscript text or extracted PDF/DOCX/TXT content. Use when a user needs 3-5 concise, evidence-grounded highlight bullets for a research paper, review, meta-analysis, case report, or bioinformatics manuscript.
- academic-norm-reviewDetects content similarity, verifies standardized citations and abbreviations, and flags potential academic integrity risks; use it before submission, during academic writing QA, or for compliance reviews.
- academic-poster-generatorComplete workflow for generating academic research posters from PDF literature; use when you need to extract paper content from PDFs and produce a LaTeX-based poster (beamerposter/tikzposter/baposter) with mandatory figure generation and a final rendered HTML deliverable.
- acronym-unpackerIntelligent medical abbreviation disambiguation tool that resolves ambiguous acronyms using clinical context, specialty-specific knowledge, and document-level semantic analysis.
- active-comparator-single-soc-faers-safety-comparisonGenerates complete FAERS pharmacovigilance study designs for multi-drug or class-level safety comparison inside one predefined SOC or AE family using active comparators, disproportionality analysis, subgroup characterization, and reviewer-facing evidence control.