outcome-extraction-for-clinical-trials

$npx mdskill add aipoch/medical-research-skills/outcome-extraction-for-clinical-trials

Extract structured clinical trial data for meta-analysis workflows.

  • Processes binary, continuous, and survival outcome measures from research papers.
  • Integrates PMID database lookup with real-time large language model extraction.
  • Selects extraction methods based on the need for reproducible, file-based results.
  • Delivers structured datasets ready for systematic review and statistical analysis.

SKILL.md

.github/skills/outcome-extraction-for-clinical-trialsView on GitHub ↗
---
name: outcome-extraction-for-clinical-trials
description: Clinical research outcome extraction for meta-analysis. Use when users need to extract outcome measures (binary, continuous, or survival data) from clinical research papers for systematic review and meta-analysis. Handles both database lookup by PMID and real-time LLM extraction.
license: MIT
author: aipoch
---
> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)
# Clinical Outcome Extraction

Extract structured outcome data from clinical research papers for meta-analysis.

## When to Use

- Use this skill when you need clinical research outcome extraction for meta-analysis. use when users need to extract outcome measures (binary, continuous, or survival data) from clinical research papers for systematic review and meta-analysis. handles both database lookup by pmid and real-time llm extraction in a reproducible workflow.
- Use this skill when a data analytics task needs a packaged method instead of ad-hoc freeform output.
- Use this skill when the user expects a concrete deliverable, validation step, or file-based result.
- Use this skill when `scripts/extract_pdf.py` is the most direct path to complete the request.
- Use this skill when you need the `outcome-extraction for clinical trials` package behavior rather than a generic answer.

## Key Features

- Scope-focused workflow aligned to: Clinical research outcome extraction for meta-analysis. Use when users need to extract outcome measures (binary, continuous, or survival data) from clinical research papers for systematic review and meta-analysis. Handles both database lookup by PMID and real-time LLM extraction.
- Packaged executable path(s): `scripts/extract_pdf.py`.
- Reference material available in `references/` for task-specific guidance.
- Structured execution path designed to keep outputs consistent and reviewable.

## Dependencies

- `Python`: `3.10+`. Repository baseline for current packaged skills.
- `Third-party packages`: `not explicitly version-pinned in this skill package`. Add pinned versions if this skill needs stricter environment control.

## Example Usage

```bash
cd "20260316/scientific-skills/Data Analytics/outcome-extraction-for-clinical-trials"
python -m py_compile scripts/extract_pdf.py
python scripts/extract_pdf.py --help
```

Example run plan:
1. Confirm the user input, output path, and any required config values.
2. Edit the in-file `CONFIG` block or documented parameters if the script uses fixed settings.
3. Run `python scripts/extract_pdf.py` with the validated inputs.
4. Review the generated output and return the final artifact with any assumptions called out.

## Implementation Details

See `## Workflow` above for related details.

- Execution model: validate the request, choose the packaged workflow, and produce a bounded deliverable.
- Input controls: confirm the source files, scope limits, output format, and acceptance criteria before running any script.
- Primary implementation surface: `scripts/extract_pdf.py`.
- Reference guidance: `references/` contains supporting rules, prompts, or checklists.
- Parameters to clarify first: input path, output path, scope filters, thresholds, and any domain-specific constraints.
- Output discipline: keep results reproducible, identify assumptions explicitly, and avoid undocumented side effects.

## Workflow

1. **Input Processing**
   - User provides: full paper text + optional PMID
   - If PMID provided: query database first for existing results
   - If no PMID or no database match: proceed to LLM extraction

2. **Outcome Identification** (LLM)
   - Extract all outcome measures from the paper
   - Determine outcome types: binary, continuous, or survival
   - Identify measurement time points
   - Output JSON format with outcome classification

3. **Data Classification** (Code)
   - Separate outcomes into three categories:
     - `bi_outcomes`: Binary/dichotomous outcomes
     - `con_outcomes`: Continuous outcomes
     - `sur_outcomes`: Survival outcomes

4. **Data Extraction by Type**

### Binary Outcomes
Extract for each intervention group:
- Sample size (n)
- Number of events (event)

### Continuous Outcomes
Extract for each intervention group:
- Sample size (n)
- Mean (mean)
- Standard deviation (sd)

### Survival Outcomes
Extract for each intervention group:
- Sample size (n)
- Hazard ratio (HR)
- 95% Lower CI
- 95% Upper CI

5. **Output Formatting**
   - Combine all extracted data
   - Ensure consistent JSON structure
   - Convert values to strings

## Output Format

```json
[
  {
    "outcome_name": "PFS",
    "detection_time_point": "12 months",
    "groups": [
      {
        "group_name": "Treatment A",
        "sample_size": "100",
        "outcome_type": "Binary|Continuous|Survival",
        "data": [
          {"value_type": "Events|Mean|SD|HR|95%Lower CI|95%Upper CI", "value": "25"}
        ]
      }
    ]
  }
]
```

## ‼️‼️‼️See references (extraction-promots.md) for detailed JSON structures for each outcome type (binary, continuous, survival)‼️‼️‼️

## Requirements

- Extract from full text, not just abstract
- Consider ALL intervention groups in the paper
- Include ALL outcome measures of interest
- Report all data regardless of statistical significance
- Use specific group names (intervention names in English), not generic terms like "treatment group"
- Output in JSON format
- Output language: English for all field values
- If data not found: output blank space ""

More from aipoch/medical-research-skills

SkillDescription
3d-molecule-ray-tracerGenerate photorealistic rendering scripts for PyMOL and UCSF ChimeraX.
abstract-summarizerTransform lengthy academic papers into concise, structured 250-word abstracts.
abstract-trimmerPrecision editing tool that reduces abstract word count through intelligent compression techniques, maintaining scientific rigor while meeting strict journal and conference requirements.
academic-abstract-refinerRefines long medical academic texts into SCI-style unstructured Chinese and English abstracts; use when you need to condense drafts/reports/summaries into bilingual abstracts and generate Summary_Report.md.
academic-cv-generatorGenerate structured academic CVs from free-form Chinese/English text and export to Word (.docx). Use this skill when you are asked to organize, generate, or optimize an academic CV (e.g., publications/projects/awards) into a consistent, formatted document with uniform-colored section headers and optional bilingual output.
academic-highlight-generatorGenerates submission-ready Elsevier/SCI Highlights from manuscript text or extracted PDF/DOCX/TXT content. Use when a user needs 3-5 concise, evidence-grounded highlight bullets for a research paper, review, meta-analysis, case report, or bioinformatics manuscript.
academic-norm-reviewDetects content similarity, verifies standardized citations and abbreviations, and flags potential academic integrity risks; use it before submission, during academic writing QA, or for compliance reviews.
academic-poster-generatorComplete workflow for generating academic research posters from PDF literature; use when you need to extract paper content from PDFs and produce a LaTeX-based poster (beamerposter/tikzposter/baposter) with mandatory figure generation and a final rendered HTML deliverable.
acronym-unpackerIntelligent medical abbreviation disambiguation tool that resolves ambiguous acronyms using clinical context, specialty-specific knowledge, and document-level semantic analysis.
active-comparator-single-soc-faers-safety-comparisonGenerates complete FAERS pharmacovigilance study designs for multi-drug or class-level safety comparison inside one predefined SOC or AE family using active comparators, disproportionality analysis, subgroup characterization, and reviewer-facing evidence control.