experimental-data-analysis

$npx mdskill add aipoch/medical-research-skills/experimental-data-analysis

Execute reproducible statistical tests on experimental datasets.

  • Clean CSV data and generate standardized audit trails.
  • Integrates with Python statistical libraries for analysis.
  • Selects tests based on group count and data assumptions.
  • Outputs timestamped reports with effect sizes and confidence intervals.

SKILL.md

.github/skills/experimental-data-analysisView on GitHub ↗
---
name: experimental-data-analysis
description: Statistical analysis and reporting for experimental datasets; use when you need to interpret experimental results, test significance (t-tests/ANOVA), or generate reproducible reports.
license: MIT
author: aipoch
---
> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)

## When to Use

- You have experimental results in CSV form and need a reproducible end-to-end analysis workflow (clean → test → report).
- You need to compare two conditions (independent or paired) and determine statistical significance with effect sizes.
- You need to compare 3+ groups (one-way) or multiple factors (multi-way) using ANOVA and post-hoc multiple comparisons.
- You must validate assumptions (normality, homogeneity of variance) and document them in a report.
- You need standardized run outputs (timestamped run directories) for traceability and auditing.

## Key Features

- Reproducible, run-based execution that writes all artifacts into `outputs/runs/<timestamp>/`.
- Data preparation guidance: missing values, outliers, and variable type identification (continuous/categorical; grouping factors).
- Descriptive statistics: means, standard deviations, confidence intervals, and grouped summary tables.
- Inferential testing:
  - t-tests (independent/paired) and non-parametric alternatives when assumptions fail.
  - ANOVA (one-way and multi-way) with post-hoc testing (e.g., Tukey).
- Reporting outputs: test statistics, p-values, effect sizes, tables, charts, and explicit assumption notes.
- Reference materials for method selection and reporting templates:
  - `references/stats-method-selection.md`
  - `references/reporting-template.md`

## Dependencies

- Python 3.10+
- pandas >= 2.0
- numpy >= 1.24
- scipy >= 1.10

## Example Usage

The workflow is run-directory based. Initialize a new run, then analyze using the latest run by default.

```bash
# 1) Initialize a new run directory with sample inputs/config
python scripts/init_run.py

# 2) Run analysis (uses the latest outputs/runs/<timestamp>/ by default)
python scripts/analyze_experiment.py
```

Expected directory conventions:

- A new run directory is created at: `outputs/runs/<timestamp>/`
- Configuration file location: `outputs/runs/<timestamp>/config.json`
- All intermediate and final artifacts (config, inputs, outputs, figures, tables) must be written inside the run directory.
- Writing outside the run directory is prohibited.

## Implementation Details

### Reproducible Run Management

- Before each execution, run:
  - `scripts/init_run.py` to create `outputs/runs/<timestamp>/` and populate initial inputs/config.
- Analysis scripts default to the latest run directory under `outputs/runs/` unless explicitly overridden (if supported by the script).

### Analysis Pipeline

1. **Data Preparation**
   - Handle missing values (e.g., drop, impute, or flag) according to the experimental design.
   - Detect and treat outliers (e.g., robust rules, domain thresholds), documenting any exclusions.
   - Identify variable roles:
     - Outcome variable(s): typically continuous measurements.
     - Grouping factors: categorical condition labels (treatment/control, timepoint, genotype, etc.).

2. **Descriptive Statistics**
   - Compute summary metrics per group:
     - Mean, standard deviation, and confidence intervals (commonly 95% CI).
   - Produce grouped summary tables suitable for reporting.

3. **Inferential Statistics**
   - **Two-group comparisons**
     - Use an independent t-test for separate groups.
     - Use a paired t-test for repeated measures / matched pairs.
     - If assumptions are violated, switch to an appropriate non-parametric alternative.
   - **Multi-group / multi-factor comparisons**
     - Use one-way ANOVA for a single factor with 3+ levels.
     - Use multi-way ANOVA when multiple factors are present.
   - **Multiple comparisons**
     - Apply post-hoc procedures (e.g., Tukey) after ANOVA when needed.
     - Define and document the multiple-comparison control strategy.

4. **Assumption Checks and Reporting Standards**
   - Validate and report:
     - Normality (per group or model residuals, as appropriate).
     - Homogeneity of variance.
   - Report, at minimum:
     - Test statistic, degrees of freedom (if applicable), p-value.
     - Effect size(s) and confidence intervals where applicable.
   - Retain analysis code and random seeds to ensure reproducibility.

More from aipoch/medical-research-skills

SkillDescription
3d-molecule-ray-tracerGenerate photorealistic rendering scripts for PyMOL and UCSF ChimeraX.
abstract-summarizerTransform lengthy academic papers into concise, structured 250-word abstracts.
abstract-trimmerPrecision editing tool that reduces abstract word count through intelligent compression techniques, maintaining scientific rigor while meeting strict journal and conference requirements.
academic-abstract-refinerRefines long medical academic texts into SCI-style unstructured Chinese and English abstracts; use when you need to condense drafts/reports/summaries into bilingual abstracts and generate Summary_Report.md.
academic-cv-generatorGenerate structured academic CVs from free-form Chinese/English text and export to Word (.docx). Use this skill when you are asked to organize, generate, or optimize an academic CV (e.g., publications/projects/awards) into a consistent, formatted document with uniform-colored section headers and optional bilingual output.
academic-highlight-generatorGenerates submission-ready Elsevier/SCI Highlights from manuscript text or extracted PDF/DOCX/TXT content. Use when a user needs 3-5 concise, evidence-grounded highlight bullets for a research paper, review, meta-analysis, case report, or bioinformatics manuscript.
academic-norm-reviewDetects content similarity, verifies standardized citations and abbreviations, and flags potential academic integrity risks; use it before submission, during academic writing QA, or for compliance reviews.
academic-poster-generatorComplete workflow for generating academic research posters from PDF literature; use when you need to extract paper content from PDFs and produce a LaTeX-based poster (beamerposter/tikzposter/baposter) with mandatory figure generation and a final rendered HTML deliverable.
acronym-unpackerIntelligent medical abbreviation disambiguation tool that resolves ambiguous acronyms using clinical context, specialty-specific knowledge, and document-level semantic analysis.
active-comparator-single-soc-faers-safety-comparisonGenerates complete FAERS pharmacovigilance study designs for multi-drug or class-level safety comparison inside one predefined SOC or AE family using active comparators, disproportionality analysis, subgroup characterization, and reviewer-facing evidence control.