literature-experiment-extract

$npx mdskill add aipoch/medical-research-skills/literature-experiment-extract

Extract experimental models, methods, and biomarkers from paper Markdown.

  • Converts unstructured text into structured evidence-backed summaries.
  • Depends on PDF-to-Markdown conversion tools for input processing.
  • Organizes findings by page markers for traceable citations.
  • Delivers one Markdown summary plus three CSV tables.

SKILL.md

.github/skills/literature-experiment-extractView on GitHub ↗
---
name: literature-experiment-extract
description: Extract experimental models, experimental methods, and biomarker information from paper Markdown (typically produced by PDF-to-Markdown tools) when a user provides paper Markdown and needs a structured, evidence-backed summary (1 Markdown + 3 CSVs).
license: MIT
author: aipoch
---
> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)

## When to Use

- You have a paper converted to Markdown (e.g., via PDF-to-Markdown) and need to extract **cell/animal models** used in experiments.
- You need a structured list of **experimental methods/protocols** described in the paper, with traceable evidence.
- You want to compile **biomarkers / detection indicators** (e.g., genes, proteins, assays, readouts) reported in the study.
- You need standardized outputs for downstream analysis: **one Markdown summary plus three CSV tables**.
- The paper Markdown includes page markers (e.g., `## Page XX`) and you want evidence organized **by page**.

## Key Features

- Extracts three entity groups from paper Markdown:
  - **Experimental models** (cell lines, animal models, strains, genotypes, etc.)
  - **Experimental methods** (assays, protocols, instruments, conditions)
  - **Biomarkers / indicators** (targets, readouts, measured variables)
- Produces **evidence-backed** results (citations/excerpts preserved and traceable to the source).
- Supports **page-aware evidence organization** when the input includes pagination headers like `## Page XX`.
- Outputs are fixed and standardized:
  - **1 Markdown summary**
  - **3 CSV files**: models / methods / biomarkers
- Uses a predefined template and extraction rules:
  - Requirements and consistency rules: `references/guide.md`
  - Output template: `assets/template.md`

## Dependencies

- None (documentation-driven workflow).
- Input assumption: paper content is available as **Markdown**, typically generated by a **PDF-to-Markdown** tool.

## Example Usage

### Input

A paper converted to Markdown, ideally with page headers:

```md
## Page 1
... text describing "C57BL/6 mice" and "Western blot" ...

## Page 2
... text describing "ELISA" and "IL-6 levels" ...
```

### Steps

1. Open the paper Markdown (typically produced by PDF-to-Markdown tools).
2. Extract **models**, **methods**, and **biomarkers** page by page.
3. Follow:
   - Extraction rules and evidence requirements: `references/guide.md`
   - Output template: `assets/template.md`
4. Output **exactly**:
   - `outputs/{Paper Abbreviation}-experiment-summary.md`
   - `outputs/{Paper Abbreviation}-models.csv`
   - `outputs/{Paper Abbreviation}-methods.csv`
   - `outputs/{Paper Abbreviation}-biomarkers.csv`

### Output (required)

- All final outputs must be **UTF-8** encoded.
- Output must be produced **directly** (no confirmation steps or optional branches).
- Evidence excerpts must remain in the **original language** of the source literature.

## Implementation Details

- **Input parsing**
  - Read the paper Markdown as the sole input source.
  - If pagination headers like `## Page XX` exist, prioritize attaching evidence to the corresponding page.

- **Extraction rules**
  - Apply entity definitions, allowed/expected fields, normalization rules, and evidence formatting as specified in `references/guide.md`.

- **Output formatting**
  - Generate outputs using `assets/template.md` as the canonical structure.
  - Add rows as needed while preserving evidence citations/excerpts.
  - The output set is fixed: **1 Markdown summary + 3 CSVs** (models/methods/biomarkers).

- **Paths and naming**
  - Default output directory: `outputs/`
  - Naming:
    - Markdown: `outputs/{Paper Abbreviation}-experiment-summary.md`
    - CSVs:
      - `outputs/{Paper Abbreviation}-models.csv`
      - `outputs/{Paper Abbreviation}-methods.csv`
      - `outputs/{Paper Abbreviation}-biomarkers.csv`

- **Language**
  - Output language should be **Chinese by default** (or the user-requested language if specified).
  - Evidence excerpts must remain in the **original language** of the source text.

More from aipoch/medical-research-skills

SkillDescription
3d-molecule-ray-tracerGenerate photorealistic rendering scripts for PyMOL and UCSF ChimeraX.
abstract-summarizerTransform lengthy academic papers into concise, structured 250-word abstracts.
abstract-trimmerPrecision editing tool that reduces abstract word count through intelligent compression techniques, maintaining scientific rigor while meeting strict journal and conference requirements.
academic-abstract-refinerRefines long medical academic texts into SCI-style unstructured Chinese and English abstracts; use when you need to condense drafts/reports/summaries into bilingual abstracts and generate Summary_Report.md.
academic-cv-generatorGenerate structured academic CVs from free-form Chinese/English text and export to Word (.docx). Use this skill when you are asked to organize, generate, or optimize an academic CV (e.g., publications/projects/awards) into a consistent, formatted document with uniform-colored section headers and optional bilingual output.
academic-highlight-generatorGenerates submission-ready Elsevier/SCI Highlights from manuscript text or extracted PDF/DOCX/TXT content. Use when a user needs 3-5 concise, evidence-grounded highlight bullets for a research paper, review, meta-analysis, case report, or bioinformatics manuscript.
academic-norm-reviewDetects content similarity, verifies standardized citations and abbreviations, and flags potential academic integrity risks; use it before submission, during academic writing QA, or for compliance reviews.
academic-poster-generatorComplete workflow for generating academic research posters from PDF literature; use when you need to extract paper content from PDFs and produce a LaTeX-based poster (beamerposter/tikzposter/baposter) with mandatory figure generation and a final rendered HTML deliverable.
acronym-unpackerIntelligent medical abbreviation disambiguation tool that resolves ambiguous acronyms using clinical context, specialty-specific knowledge, and document-level semantic analysis.
active-comparator-single-soc-faers-safety-comparisonGenerates complete FAERS pharmacovigilance study designs for multi-drug or class-level safety comparison inside one predefined SOC or AE family using active comparators, disproportionality analysis, subgroup characterization, and reviewer-facing evidence control.