domain-research-health-science

Name: domain-research-health-science
Author: lyndonkl/claude
$npx mdskill add lyndonkl/claude/domain-research-health-science
Engineers rigorous clinical research through evidence grading.
Structures questions using PICOT and applies GRADE certainty ratings.
Integrates Cochrane RoB 2 and ROBINS-I bias evaluation tools.
Prioritizes patient-important outcomes against study quality metrics.
Delivers decision-ready summaries for guidelines and regulatory review.
SKILL.md
.github/skills/domain-research-health-scienceView on GitHub ↗
---
name: domain-research-health-science
description: Guides clinical and health science research through PICOT question formulation, evidence hierarchy assessment, bias evaluation (Cochrane RoB 2, ROBINS-I), outcome prioritization, and GRADE certainty rating. Use when formulating clinical research questions, evaluating health evidence quality, prioritizing patient-important outcomes, conducting systematic reviews or meta-analyses, creating evidence summaries for guidelines, or assessing regulatory evidence.
---
# Domain Research: Health Science

## Table of Contents
- [Workflow](#workflow)
- [Common Patterns](#common-patterns)
- [Guardrails](#guardrails)
- [Quick Reference](#quick-reference)

## Workflow

Copy this checklist and track your progress:

```
Health Research Progress:
- [ ] Step 1: Formulate research question (PICOT)
- [ ] Step 2: Assess evidence hierarchy and study design
- [ ] Step 3: Evaluate study quality and bias
- [ ] Step 4: Prioritize and define outcomes
- [ ] Step 5: Synthesize evidence and grade certainty
- [ ] Step 6: Create decision-ready summary
```

**Step 1: Formulate research question (PICOT)**

Use PICOT framework to structure answerable clinical question. Define Population (demographics, condition, setting), Intervention (treatment, exposure, diagnostic test), Comparator (alternative treatment, placebo, standard care), Outcome (patient-important endpoints), and Timeframe (follow-up duration). See [resources/template.md](resources/template.md#picot-framework) for structured templates.

**Step 2: Assess evidence hierarchy and study design**

Determine appropriate study design based on research question type (therapy: RCT; diagnosis: cross-sectional; prognosis: cohort; harm: case-control or cohort). Understand hierarchy of evidence (systematic reviews > RCTs > cohort > case-control > case series). See [resources/methodology.md](resources/methodology.md#evidence-hierarchy) for design selection guidance.

**Step 3: Evaluate study quality and bias**

Apply risk of bias assessment tools (Cochrane RoB 2 for RCTs, ROBINS-I for observational studies, QUADAS-2 for diagnostic accuracy). Evaluate randomization, blinding, allocation concealment, incomplete outcome data, selective reporting. See [resources/methodology.md](resources/methodology.md#bias-assessment) for detailed criteria.

**Step 4: Prioritize and define outcomes**

Distinguish patient-important outcomes (mortality, symptoms, quality of life, function) from surrogate endpoints (biomarkers, lab values). Create outcome hierarchy: critical (decision-driving), important (informs decision), not important. Define measurement instruments and minimal clinically important differences (MCID). See [resources/template.md](resources/template.md#outcome-hierarchy) for prioritization framework.

**Step 5: Synthesize evidence and grade certainty**

Apply GRADE (Grading of Recommendations Assessment, Development and Evaluation) to rate certainty of evidence (high, moderate, low, very low). Consider study limitations, inconsistency, indirectness, imprecision, publication bias. Upgrade for large effects, dose-response, or confounders reducing effect. See [resources/methodology.md](resources/methodology.md#grade-framework) for rating guidance.

**Step 6: Create decision-ready summary**

Produce evidence profile or summary of findings table linking outcomes to certainty ratings and effect estimates. Include clinical interpretation, applicability assessment, and evidence gaps. Validate using [resources/evaluators/rubric_domain_research_health_science.json](resources/evaluators/rubric_domain_research_health_science.json). **Minimum standard**: Average score ≥ 3.5.

## Common Patterns

**Pattern 1: Therapy/Intervention Question**
- **PICOT**: Adults with condition → new treatment vs standard care → patient-important outcomes → follow-up period
- **Study design**: RCT preferred (highest quality for causation); systematic review of RCTs for synthesis
- **Key outcomes**: Mortality, morbidity, quality of life, adverse events
- **Bias assessment**: Cochrane RoB 2 (randomization, blinding, attrition, selective reporting)
- **Example**: SGLT2 inhibitors for heart failure → reduced mortality (GRADE: high certainty)

**Pattern 2: Diagnostic Test Accuracy**
- **PICOT**: Patients with suspected condition → new test vs reference standard → sensitivity/specificity → cross-sectional
- **Study design**: Cross-sectional study with consecutive enrollment; avoid case-control (inflates accuracy)
- **Key outcomes**: Sensitivity, specificity, positive/negative predictive values, likelihood ratios
- **Bias assessment**: QUADAS-2 (patient selection, index test, reference standard, flow and timing)
- **Example**: High-sensitivity troponin for MI → sensitivity 95%, specificity 92% (GRADE: moderate certainty)

**Pattern 3: Prognosis/Risk Prediction**
- **PICOT**: Population with condition/exposure → risk factors → outcomes (death, disease progression) → long-term follow-up
- **Study design**: Prospective cohort (follow from exposure to outcome); avoid retrospective (recall bias)
- **Key outcomes**: Incidence, hazard ratios, absolute risk, risk prediction model performance (C-statistic, calibration)
- **Bias assessment**: ROBINS-I or PROBAST (for prediction models)
- **Example**: Framingham Risk Score for CVD → C-statistic 0.76 (moderate discrimination)

**Pattern 4: Harm/Safety Assessment**
- **PICOT**: Population exposed to intervention → adverse events → timeframe for rare/delayed harms
- **Study design**: RCT for common harms; observational (cohort, case-control) for rare harms (larger sample, longer follow-up)
- **Key outcomes**: Serious adverse events, discontinuations, organ-specific toxicity, long-term safety
- **Bias assessment**: Different for rare vs common harms; consider confounding by indication in observational studies
- **Example**: NSAID cardiovascular risk → observational studies show increased MI risk (GRADE: low certainty due to confounding)

**Pattern 5: Systematic Review/Meta-Analysis**
- **PICOT**: Defined in protocol; guides search strategy, inclusion criteria, outcome extraction
- **Study design**: Comprehensive search, explicit eligibility criteria, duplicate screening/extraction, bias assessment, quantitative synthesis (if appropriate)
- **Key outcomes**: Pooled effect estimates (RR, OR, MD, SMD), heterogeneity (I²), certainty rating (GRADE)
- **Bias assessment**: Individual study RoB + review-level assessment (AMSTAR 2 for review quality)
- **Example**: Statins for primary prevention → RR 0.75 for MI (95% CI 0.70-0.80, I²=12%, GRADE: high certainty)

## Guardrails

**Key requirements:**

1. **Use PICOT for all clinical questions**: Vague questions lead to unfocused research. Specify Population, Intervention, Comparator, Outcome, Timeframe explicitly rather than asking "does X work?" without defining for whom, compared to what, and measuring which outcomes.

2. **Match study design to question type**: RCTs answer therapy questions (causal inference). Cohort studies answer prognosis. Cross-sectional studies answer diagnosis. Case-control studies answer rare harm or etiology. Avoid claiming causation from observational data or using case series for treatment effects.

3. **Prioritize patient-important outcomes over surrogates**: Surrogate endpoints (biomarkers, lab values) do not always correlate with patient outcomes. Focus on mortality, morbidity, symptoms, function, quality of life. Only use surrogates when a validated relationship to patient outcomes exists.

4. **Assess bias systematically**: Use validated tools (Cochrane RoB 2, ROBINS-I, QUADAS-2) rather than subjective judgment, because bias assessment directly affects certainty of evidence and clinical recommendations. Common biases: selection bias, performance bias (lack of blinding), detection bias, attrition bias, reporting bias.

5. **Apply GRADE to rate certainty of evidence**: Avoid conflating study design with certainty. RCTs start as high certainty but can be downgraded (serious limitations, inconsistency, indirectness, imprecision, publication bias). Observational studies start as low but can be upgraded (large effect, dose-response, residual confounding reducing effect).

6. **Distinguish statistical significance from clinical importance**: p < 0.05 does not mean clinically meaningful. Consider minimal clinically important difference (MCID), absolute risk reduction, number needed to treat (NNT). A small p-value with tiny effect size is statistically significant but clinically irrelevant.

7. **Assess external validity and applicability**: Evidence from selected trial populations may not apply to the target patient. Consider PICO match, setting differences (tertiary center vs community), intervention feasibility, patient values and preferences.

8. **State limitations and certainty explicitly**: All evidence has limitations. Specify what is uncertain, where evidence gaps exist, and how this affects confidence in recommendations.

**Common pitfalls:**

- ❌ **Treating all RCTs as high quality**: RCTs can have serious bias (inadequate randomization, unblinded, high attrition). Always assess bias.
- ❌ **Ignoring heterogeneity in meta-analysis**: High I² (>50%) suggests important differences across studies. Explore sources (population, intervention, outcome definition) before pooling.
- ❌ **Confusing association with causation**: Observational studies show association, not causation. Residual confounding is always possible.
- ❌ **Using composite outcomes uncritically**: Composite endpoints (e.g., "death or MI or hospitalization") obscure which component drives effect. Report components separately.
- ❌ **Accepting industry-funded evidence uncritically**: Pharmaceutical/device company-sponsored trials may have bias (outcome selection, selective reporting). Assess for conflicts of interest.
- ❌ **Over-interpreting subgroup analyses**: Most subgroup effects are chance findings. Only credible if pre-specified, statistically tested for interaction, and biologically plausible.

## Quick Reference

**Key resources:**

- **[resources/template.md](resources/template.md)**: PICOT framework, outcome hierarchy template, evidence table, GRADE summary template
- **[resources/methodology.md](resources/methodology.md)**: Evidence hierarchy, bias assessment tools, GRADE detailed guidance, study design selection, systematic review methods
- **[resources/evaluators/rubric_domain_research_health_science.json](resources/evaluators/rubric_domain_research_health_science.json)**: Quality criteria for research questions, evidence synthesis, and clinical interpretation

**PICOT Template:**
- **P** (Population): [Who? Age, sex, condition, severity, setting]
- **I** (Intervention): [What? Drug, procedure, test, exposure - dose, duration, route]
- **C** (Comparator): [Compared to what? Placebo, standard care, alternative treatment]
- **O** (Outcome): [What matters? Mortality, symptoms, QoL, harms - measurement instrument, timepoint]
- **T** (Timeframe): [How long? Follow-up duration, time to outcome]

**Evidence Hierarchy (Therapy Questions):**
1. Systematic reviews/meta-analyses of RCTs
2. Individual RCTs (large, well-designed)
3. Cohort studies (prospective)
4. Case-control studies
5. Case series, case reports
6. Expert opinion, pathophysiologic rationale

**GRADE Certainty Ratings:**
- **High** (⊕⊕⊕⊕): Very confident true effect is close to estimated effect
- **Moderate** (⊕⊕⊕○): Moderately confident, true effect likely close but could be substantially different
- **Low** (⊕⊕○○): Limited confidence, true effect may be substantially different
- **Very Low** (⊕○○○): Very little confidence, true effect likely substantially different

**Typical workflow time:**

- PICOT formulation: 10-15 minutes
- Single study critical appraisal: 20-30 minutes
- Systematic review protocol: 2-4 hours
- Evidence synthesis with GRADE: 1-2 hours
- Full systematic review: 40-100 hours (depending on scope)

**When to escalate:**

- Complex statistical meta-analysis (network meta-analysis, IPD meta-analysis)
- Advanced causal inference methods (instrumental variables, propensity scores)
- Health technology assessment (cost-effectiveness, budget impact)
- Guideline development panels (requires multi-stakeholder consensus)
→ Consult biostatistician, health economist, or guideline methodologist

**Inputs required:**

- **Research question** (clinical scenario or decision problem)
- **Evidence sources** (studies to appraise, databases for systematic review)
- **Outcome preferences** (which outcomes matter most to patients/clinicians)
- **Context** (setting, patient population, decision urgency)

**Outputs produced:**

- `domain-research-health-science.md`: Structured research question, evidence appraisal, outcome hierarchy, certainty assessment, clinical interpretation