domain-research-health-science
$
npx mdskill add lyndonkl/claude/domain-research-health-scienceEngineers rigorous clinical research through evidence grading.
- Structures questions using PICOT and applies GRADE certainty ratings.
- Integrates Cochrane RoB 2 and ROBINS-I bias evaluation tools.
- Prioritizes patient-important outcomes against study quality metrics.
- Delivers decision-ready summaries for guidelines and regulatory review.
SKILL.md
.github/skills/domain-research-health-scienceView on GitHub ↗
--- name: domain-research-health-science description: Guides clinical and health science research through PICOT question formulation, evidence hierarchy assessment, bias evaluation (Cochrane RoB 2, ROBINS-I), outcome prioritization, and GRADE certainty rating. Use when formulating clinical research questions, evaluating health evidence quality, prioritizing patient-important outcomes, conducting systematic reviews or meta-analyses, creating evidence summaries for guidelines, or assessing regulatory evidence. --- # Domain Research: Health Science ## Table of Contents - [Workflow](#workflow) - [Common Patterns](#common-patterns) - [Guardrails](#guardrails) - [Quick Reference](#quick-reference) ## Workflow Copy this checklist and track your progress: ``` Health Research Progress: - [ ] Step 1: Formulate research question (PICOT) - [ ] Step 2: Assess evidence hierarchy and study design - [ ] Step 3: Evaluate study quality and bias - [ ] Step 4: Prioritize and define outcomes - [ ] Step 5: Synthesize evidence and grade certainty - [ ] Step 6: Create decision-ready summary ``` **Step 1: Formulate research question (PICOT)** Use PICOT framework to structure answerable clinical question. Define Population (demographics, condition, setting), Intervention (treatment, exposure, diagnostic test), Comparator (alternative treatment, placebo, standard care), Outcome (patient-important endpoints), and Timeframe (follow-up duration). See [resources/template.md](resources/template.md#picot-framework) for structured templates. **Step 2: Assess evidence hierarchy and study design** Determine appropriate study design based on research question type (therapy: RCT; diagnosis: cross-sectional; prognosis: cohort; harm: case-control or cohort). Understand hierarchy of evidence (systematic reviews > RCTs > cohort > case-control > case series). See [resources/methodology.md](resources/methodology.md#evidence-hierarchy) for design selection guidance. **Step 3: Evaluate study quality and bias** Apply risk of bias assessment tools (Cochrane RoB 2 for RCTs, ROBINS-I for observational studies, QUADAS-2 for diagnostic accuracy). Evaluate randomization, blinding, allocation concealment, incomplete outcome data, selective reporting. See [resources/methodology.md](resources/methodology.md#bias-assessment) for detailed criteria. **Step 4: Prioritize and define outcomes** Distinguish patient-important outcomes (mortality, symptoms, quality of life, function) from surrogate endpoints (biomarkers, lab values). Create outcome hierarchy: critical (decision-driving), important (informs decision), not important. Define measurement instruments and minimal clinically important differences (MCID). See [resources/template.md](resources/template.md#outcome-hierarchy) for prioritization framework. **Step 5: Synthesize evidence and grade certainty** Apply GRADE (Grading of Recommendations Assessment, Development and Evaluation) to rate certainty of evidence (high, moderate, low, very low). Consider study limitations, inconsistency, indirectness, imprecision, publication bias. Upgrade for large effects, dose-response, or confounders reducing effect. See [resources/methodology.md](resources/methodology.md#grade-framework) for rating guidance. **Step 6: Create decision-ready summary** Produce evidence profile or summary of findings table linking outcomes to certainty ratings and effect estimates. Include clinical interpretation, applicability assessment, and evidence gaps. Validate using [resources/evaluators/rubric_domain_research_health_science.json](resources/evaluators/rubric_domain_research_health_science.json). **Minimum standard**: Average score ≥ 3.5. ## Common Patterns **Pattern 1: Therapy/Intervention Question** - **PICOT**: Adults with condition → new treatment vs standard care → patient-important outcomes → follow-up period - **Study design**: RCT preferred (highest quality for causation); systematic review of RCTs for synthesis - **Key outcomes**: Mortality, morbidity, quality of life, adverse events - **Bias assessment**: Cochrane RoB 2 (randomization, blinding, attrition, selective reporting) - **Example**: SGLT2 inhibitors for heart failure → reduced mortality (GRADE: high certainty) **Pattern 2: Diagnostic Test Accuracy** - **PICOT**: Patients with suspected condition → new test vs reference standard → sensitivity/specificity → cross-sectional - **Study design**: Cross-sectional study with consecutive enrollment; avoid case-control (inflates accuracy) - **Key outcomes**: Sensitivity, specificity, positive/negative predictive values, likelihood ratios - **Bias assessment**: QUADAS-2 (patient selection, index test, reference standard, flow and timing) - **Example**: High-sensitivity troponin for MI → sensitivity 95%, specificity 92% (GRADE: moderate certainty) **Pattern 3: Prognosis/Risk Prediction** - **PICOT**: Population with condition/exposure → risk factors → outcomes (death, disease progression) → long-term follow-up - **Study design**: Prospective cohort (follow from exposure to outcome); avoid retrospective (recall bias) - **Key outcomes**: Incidence, hazard ratios, absolute risk, risk prediction model performance (C-statistic, calibration) - **Bias assessment**: ROBINS-I or PROBAST (for prediction models) - **Example**: Framingham Risk Score for CVD → C-statistic 0.76 (moderate discrimination) **Pattern 4: Harm/Safety Assessment** - **PICOT**: Population exposed to intervention → adverse events → timeframe for rare/delayed harms - **Study design**: RCT for common harms; observational (cohort, case-control) for rare harms (larger sample, longer follow-up) - **Key outcomes**: Serious adverse events, discontinuations, organ-specific toxicity, long-term safety - **Bias assessment**: Different for rare vs common harms; consider confounding by indication in observational studies - **Example**: NSAID cardiovascular risk → observational studies show increased MI risk (GRADE: low certainty due to confounding) **Pattern 5: Systematic Review/Meta-Analysis** - **PICOT**: Defined in protocol; guides search strategy, inclusion criteria, outcome extraction - **Study design**: Comprehensive search, explicit eligibility criteria, duplicate screening/extraction, bias assessment, quantitative synthesis (if appropriate) - **Key outcomes**: Pooled effect estimates (RR, OR, MD, SMD), heterogeneity (I²), certainty rating (GRADE) - **Bias assessment**: Individual study RoB + review-level assessment (AMSTAR 2 for review quality) - **Example**: Statins for primary prevention → RR 0.75 for MI (95% CI 0.70-0.80, I²=12%, GRADE: high certainty) ## Guardrails **Key requirements:** 1. **Use PICOT for all clinical questions**: Vague questions lead to unfocused research. Specify Population, Intervention, Comparator, Outcome, Timeframe explicitly rather than asking "does X work?" without defining for whom, compared to what, and measuring which outcomes. 2. **Match study design to question type**: RCTs answer therapy questions (causal inference). Cohort studies answer prognosis. Cross-sectional studies answer diagnosis. Case-control studies answer rare harm or etiology. Avoid claiming causation from observational data or using case series for treatment effects. 3. **Prioritize patient-important outcomes over surrogates**: Surrogate endpoints (biomarkers, lab values) do not always correlate with patient outcomes. Focus on mortality, morbidity, symptoms, function, quality of life. Only use surrogates when a validated relationship to patient outcomes exists. 4. **Assess bias systematically**: Use validated tools (Cochrane RoB 2, ROBINS-I, QUADAS-2) rather than subjective judgment, because bias assessment directly affects certainty of evidence and clinical recommendations. Common biases: selection bias, performance bias (lack of blinding), detection bias, attrition bias, reporting bias. 5. **Apply GRADE to rate certainty of evidence**: Avoid conflating study design with certainty. RCTs start as high certainty but can be downgraded (serious limitations, inconsistency, indirectness, imprecision, publication bias). Observational studies start as low but can be upgraded (large effect, dose-response, residual confounding reducing effect). 6. **Distinguish statistical significance from clinical importance**: p < 0.05 does not mean clinically meaningful. Consider minimal clinically important difference (MCID), absolute risk reduction, number needed to treat (NNT). A small p-value with tiny effect size is statistically significant but clinically irrelevant. 7. **Assess external validity and applicability**: Evidence from selected trial populations may not apply to the target patient. Consider PICO match, setting differences (tertiary center vs community), intervention feasibility, patient values and preferences. 8. **State limitations and certainty explicitly**: All evidence has limitations. Specify what is uncertain, where evidence gaps exist, and how this affects confidence in recommendations. **Common pitfalls:** - ❌ **Treating all RCTs as high quality**: RCTs can have serious bias (inadequate randomization, unblinded, high attrition). Always assess bias. - ❌ **Ignoring heterogeneity in meta-analysis**: High I² (>50%) suggests important differences across studies. Explore sources (population, intervention, outcome definition) before pooling. - ❌ **Confusing association with causation**: Observational studies show association, not causation. Residual confounding is always possible. - ❌ **Using composite outcomes uncritically**: Composite endpoints (e.g., "death or MI or hospitalization") obscure which component drives effect. Report components separately. - ❌ **Accepting industry-funded evidence uncritically**: Pharmaceutical/device company-sponsored trials may have bias (outcome selection, selective reporting). Assess for conflicts of interest. - ❌ **Over-interpreting subgroup analyses**: Most subgroup effects are chance findings. Only credible if pre-specified, statistically tested for interaction, and biologically plausible. ## Quick Reference **Key resources:** - **[resources/template.md](resources/template.md)**: PICOT framework, outcome hierarchy template, evidence table, GRADE summary template - **[resources/methodology.md](resources/methodology.md)**: Evidence hierarchy, bias assessment tools, GRADE detailed guidance, study design selection, systematic review methods - **[resources/evaluators/rubric_domain_research_health_science.json](resources/evaluators/rubric_domain_research_health_science.json)**: Quality criteria for research questions, evidence synthesis, and clinical interpretation **PICOT Template:** - **P** (Population): [Who? Age, sex, condition, severity, setting] - **I** (Intervention): [What? Drug, procedure, test, exposure - dose, duration, route] - **C** (Comparator): [Compared to what? Placebo, standard care, alternative treatment] - **O** (Outcome): [What matters? Mortality, symptoms, QoL, harms - measurement instrument, timepoint] - **T** (Timeframe): [How long? Follow-up duration, time to outcome] **Evidence Hierarchy (Therapy Questions):** 1. Systematic reviews/meta-analyses of RCTs 2. Individual RCTs (large, well-designed) 3. Cohort studies (prospective) 4. Case-control studies 5. Case series, case reports 6. Expert opinion, pathophysiologic rationale **GRADE Certainty Ratings:** - **High** (⊕⊕⊕⊕): Very confident true effect is close to estimated effect - **Moderate** (⊕⊕⊕○): Moderately confident, true effect likely close but could be substantially different - **Low** (⊕⊕○○): Limited confidence, true effect may be substantially different - **Very Low** (⊕○○○): Very little confidence, true effect likely substantially different **Typical workflow time:** - PICOT formulation: 10-15 minutes - Single study critical appraisal: 20-30 minutes - Systematic review protocol: 2-4 hours - Evidence synthesis with GRADE: 1-2 hours - Full systematic review: 40-100 hours (depending on scope) **When to escalate:** - Complex statistical meta-analysis (network meta-analysis, IPD meta-analysis) - Advanced causal inference methods (instrumental variables, propensity scores) - Health technology assessment (cost-effectiveness, budget impact) - Guideline development panels (requires multi-stakeholder consensus) → Consult biostatistician, health economist, or guideline methodologist **Inputs required:** - **Research question** (clinical scenario or decision problem) - **Evidence sources** (studies to appraise, databases for systematic review) - **Outcome preferences** (which outcomes matter most to patients/clinicians) - **Context** (setting, patient population, decision urgency) **Outputs produced:** - `domain-research-health-science.md`: Structured research question, evidence appraisal, outcome hierarchy, certainty assessment, clinical interpretation
More from lyndonkl/claude
- abstraction-concrete-examplesBuilds structured abstraction ladders that translate high-level principles into concrete, actionable examples across 3-5 levels. Bridges communication gaps, reveals hidden assumptions, and tests whether abstract ideas work in practice. Use when explaining concepts at different expertise levels, moving between abstract principles and concrete implementation, identifying edge cases by testing ideas against scenarios, designing layered documentation, decomposing complex problems into actionable steps, or bridging strategy-execution gaps.
- academic-letter-architectGuides the creation of evidence-based academic recommendation letters, reference letters, and award nominations that combine concrete examples, meaningful comparisons, and genuine enthusiasm. Use when writing recommendation letters for students, postdocs, or colleagues, or when user mentions recommendation letter, reference, nomination, letter of support, endorsement, or needs help with strong advocacy and comparative statements.
- adr-architectureDocuments significant architectural and technical decisions with full context, alternatives considered, trade-offs analyzed, and consequences understood. Creates a decision trail that helps teams understand why decisions were made. Use when choosing between technology options, making infrastructure decisions, establishing standards, migrating systems, or when user mentions ADR, architecture decision, technical decision record, or decision documentation.
- adverse-selection-priorProduces a Bayesian prior probability that an offered transaction is +EV for the recipient, given that the counterparty chose to propose it. Applies Akerlof market-for-lemons logic -- if they offered it, they believe it is +EV for them, so the prior that it is +EV for us is materially below 50%. Reusable across trade evaluation, waiver drops (another team dropping a player is also adverse selection), job-offer analysis, M&A, and any "someone offered me this" situation. Use when you receive an unsolicited trade/offer/proposal, analyzing incoming trade prior, evaluating why a counterparty proposed a deal, or when user mentions adverse selection, market for lemons, why did they offer this, incoming trade prior, they proposed it, Bayesian adjustment on received offer.
- alignment-values-north-starCreates actionable alignment frameworks that give teams a shared North Star (direction), values (guardrails), and decision tenets (behavioral standards). Enables autonomous decision-making while maintaining organizational coherence. Use when starting new teams, scaling organizations, defining culture, establishing product vision, resolving misalignment, creating strategic clarity, or when user mentions North Star, team values, mission, principles, guardrails, decision framework, or cultural alignment.
- analogy-weight-checkFor every analogy in a substacker draft, verifies it carries mechanical weight — the analogy does real work explaining the mechanism, not merely decorates it. Cross-references analogy-catalog.md for novelty (is this analogy reused from a prior post?) and domain fit (biology > organizational > sports preferred; physics/military disfavored). Use whenever an analogy appears in the draft. Trigger keywords: analogy weight, decorative, mechanical weight, reused analogy, catalog check, metaphor check.
- answer-uncomfortable-questionTakes one strategic question about substacker ("should we launch paid?", "is this section dead?", "are we writing for the wrong audience?") and produces the mandatory evidence + reasoning + downside triad plus a recommendation. Used 3 times per Growth Strategist review. Trigger keywords: uncomfortable question, strategic question, evidence reasoning downside, triad.
- attribute-performanceFor each substacker post that materially over- or under-performs the rolling baseline (|z| ≥ 1.0), produces a plain-English attribution paragraph with calibrated confidence (high / medium / low / unexplained). Considers subject-line effect, topic zeitgeist, external share, day-of-week, length effect, and audience-notes signals. Labels unexplained outliers explicitly rather than fabricating a story. Use after compute-baseline when outlier posts exist. Trigger keywords: attribution, why did this post work, outlier explanation, performance analysis.
- auction-first-price-shadingComputes the optimal shaded bid for a first-price sealed-bid auction given a true private value, an estimate of the number of competing bidders N, and a value-distribution assumption. Implements the `(N-1)/N` equilibrium shading rule for uniform private values, adjusts for log-normal or empirical value distributions, layers a risk-aversion adjustment, and caps output against the bidder's remaining budget. Domain-neutral auction theory reusable across fantasy sports (baseball FAAB, NBA/NHL waiver auctions), prediction-market limit sizing, sealed procurement bids, and any blind-bid context. Use when user mentions "first-price auction bid", "sealed bid shading", "(N-1)/N", "FAAB bid amount", "auction shading", "optimal bid first-price", "bid for sealed-bid", "blind bid sizing", or when downstream logic needs a principled shade factor rather than an ad-hoc heuristic.
- auction-winners-curse-haircutApplies a Bayesian haircut to a bid valuation for common-value auctions where winning is itself evidence the bidder over-estimated. Takes a raw valuation, a value-type classification (common_value / private_value / mixed), the number of informed bidders N, and a signal-dispersion estimate, and returns an adjusted valuation. Domain-neutral and reusable across fantasy FAAB, prediction markets, M&A bids, ad-auction budgets, and any generic bidding context. Use when user mentions "winner's curse", "common value auction", "valuation haircut", "adverse valuation", "Bayesian bid adjustment", or "over-paying in auction".