experimental-data-analysis

Name: experimental-data-analysis
Author: aipoch/medical-research-skills

$npx mdskill add aipoch/medical-research-skills/experimental-data-analysis

Execute reproducible statistical tests on experimental datasets.

Clean CSV data and generate standardized audit trails.
Integrates with Python statistical libraries for analysis.
Selects tests based on group count and data assumptions.
Outputs timestamped reports with effect sizes and confidence intervals.

SKILL.md

.github/skills/experimental-data-analysisView on GitHub ↗

---
name: experimental-data-analysis
description: Statistical analysis and reporting for experimental datasets; use when you need to interpret experimental results, test significance (t-tests/ANOVA), or generate reproducible reports.
license: MIT
author: aipoch
---
> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)

## When to Use

- You have experimental results in CSV form and need a reproducible end-to-end analysis workflow (clean → test → report).
- You need to compare two conditions (independent or paired) and determine statistical significance with effect sizes.
- You need to compare 3+ groups (one-way) or multiple factors (multi-way) using ANOVA and post-hoc multiple comparisons.
- You must validate assumptions (normality, homogeneity of variance) and document them in a report.
- You need standardized run outputs (timestamped run directories) for traceability and auditing.

## Key Features

- Reproducible, run-based execution that writes all artifacts into `outputs/runs/<timestamp>/`.
- Data preparation guidance: missing values, outliers, and variable type identification (continuous/categorical; grouping factors).
- Descriptive statistics: means, standard deviations, confidence intervals, and grouped summary tables.
- Inferential testing:
  - t-tests (independent/paired) and non-parametric alternatives when assumptions fail.
  - ANOVA (one-way and multi-way) with post-hoc testing (e.g., Tukey).
- Reporting outputs: test statistics, p-values, effect sizes, tables, charts, and explicit assumption notes.
- Reference materials for method selection and reporting templates:
  - `references/stats-method-selection.md`
  - `references/reporting-template.md`

## Dependencies

- Python 3.10+
- pandas >= 2.0
- numpy >= 1.24
- scipy >= 1.10

## Example Usage

The workflow is run-directory based. Initialize a new run, then analyze using the latest run by default.

```bash
# 1) Initialize a new run directory with sample inputs/config
python scripts/init_run.py

# 2) Run analysis (uses the latest outputs/runs/<timestamp>/ by default)
python scripts/analyze_experiment.py
```

Expected directory conventions:

- A new run directory is created at: `outputs/runs/<timestamp>/`
- Configuration file location: `outputs/runs/<timestamp>/config.json`
- All intermediate and final artifacts (config, inputs, outputs, figures, tables) must be written inside the run directory.
- Writing outside the run directory is prohibited.

## Implementation Details

### Reproducible Run Management

- Before each execution, run:
  - `scripts/init_run.py` to create `outputs/runs/<timestamp>/` and populate initial inputs/config.
- Analysis scripts default to the latest run directory under `outputs/runs/` unless explicitly overridden (if supported by the script).

### Analysis Pipeline

1. **Data Preparation**
   - Handle missing values (e.g., drop, impute, or flag) according to the experimental design.
   - Detect and treat outliers (e.g., robust rules, domain thresholds), documenting any exclusions.
   - Identify variable roles:
     - Outcome variable(s): typically continuous measurements.
     - Grouping factors: categorical condition labels (treatment/control, timepoint, genotype, etc.).

2. **Descriptive Statistics**
   - Compute summary metrics per group:
     - Mean, standard deviation, and confidence intervals (commonly 95% CI).
   - Produce grouped summary tables suitable for reporting.

3. **Inferential Statistics**
   - **Two-group comparisons**
     - Use an independent t-test for separate groups.
     - Use a paired t-test for repeated measures / matched pairs.
     - If assumptions are violated, switch to an appropriate non-parametric alternative.
   - **Multi-group / multi-factor comparisons**
     - Use one-way ANOVA for a single factor with 3+ levels.
     - Use multi-way ANOVA when multiple factors are present.
   - **Multiple comparisons**
     - Apply post-hoc procedures (e.g., Tukey) after ANOVA when needed.
     - Define and document the multiple-comparison control strategy.

4. **Assumption Checks and Reporting Standards**
   - Validate and report:
     - Normality (per group or model residuals, as appropriate).
     - Homogeneity of variance.
   - Report, at minimum:
     - Test statistic, degrees of freedom (if applicable), p-value.
     - Effect size(s) and confidence intervals where applicable.
   - Retain analysis code and random seeds to ensure reproducibility.

More from aipoch/medical-research-skills

Skill	Description
3d-molecule-ray-tracer	Generate photorealistic rendering scripts for PyMOL and UCSF ChimeraX.
abstract-summarizer	Transform lengthy academic papers into concise, structured 250-word abstracts.
abstract-trimmer	Precision editing tool that reduces abstract word count through intelligent compression techniques, maintaining scientific rigor while meeting strict journal and conference requirements.
academic-abstract-refiner	Refines long medical academic texts into SCI-style unstructured Chinese and English abstracts; use when you need to condense drafts/reports/summaries into bilingual abstracts and generate Summary_Report.md.
academic-cv-generator	Generate structured academic CVs from free-form Chinese/English text and export to Word (.docx). Use this skill when you are asked to organize, generate, or optimize an academic CV (e.g., publications/projects/awards) into a consistent, formatted document with uniform-colored section headers and optional bilingual output.
academic-highlight-generator	Generates submission-ready Elsevier/SCI Highlights from manuscript text or extracted PDF/DOCX/TXT content. Use when a user needs 3-5 concise, evidence-grounded highlight bullets for a research paper, review, meta-analysis, case report, or bioinformatics manuscript.
academic-norm-review	Detects content similarity, verifies standardized citations and abbreviations, and flags potential academic integrity risks; use it before submission, during academic writing QA, or for compliance reviews.
academic-poster-generator	Complete workflow for generating academic research posters from PDF literature; use when you need to extract paper content from PDFs and produce a LaTeX-based poster (beamerposter/tikzposter/baposter) with mandatory figure generation and a final rendered HTML deliverable.
acronym-unpacker	Intelligent medical abbreviation disambiguation tool that resolves ambiguous acronyms using clinical context, specialty-specific knowledge, and document-level semantic analysis.
active-comparator-single-soc-faers-safety-comparison	Generates complete FAERS pharmacovigilance study designs for multi-drug or class-level safety comparison inside one predefined SOC or AE family using active comparators, disproportionality analysis, subgroup characterization, and reviewer-facing evidence control.