design-of-experiments
$
npx mdskill add lyndonkl/claude/design-of-experimentsDesigns efficient experiments to optimize multi-factor systems.
- Helps screen variables and tune parameters within budget limits.
- Depends on user input for factors, levels, and response metrics.
- Selects designs like factorial or Taguchi based on goals.
- Outputs structured experiment plans with execution steps.
SKILL.md
.github/skills/design-of-experimentsView on GitHub ↗
--- name: design-of-experiments description: Generates structured experimental designs (factorial, response surface, Taguchi) to systematically discover how multiple factors affect outcomes while minimizing experimental runs. Use when optimizing multi-factor systems with limited experimental budget, screening many variables to find the vital few, discovering interactions between parameters, mapping response surfaces for peak performance, validating robustness to noise factors, or when users mention factorial designs, A/B/n testing, parameter tuning, or process optimization. --- # Design of Experiments ## Table of Contents - [Workflow](#workflow) - [Common Patterns](#common-patterns) - [Guardrails](#guardrails) - [Quick Reference](#quick-reference) ## Workflow Copy this checklist and track your progress: ``` Design of Experiments Progress: - [ ] Step 1: Define objectives and constraints - [ ] Step 2: Identify factors, levels, and responses - [ ] Step 3: Choose experimental design - [ ] Step 4: Plan execution details - [ ] Step 5: Create experiment plan document - [ ] Step 6: Validate quality ``` **Step 1: Define objectives and constraints** Clarify the experiment goal (screening vs optimization), response metric(s), experimental budget (max runs), time/cost constraints, and success criteria. See [Common Patterns](#common-patterns) for typical objectives. **Step 2: Identify factors, levels, and responses** List all candidate factors (controllable inputs), specify levels for each factor (low/high or discrete values), categorize factors (control vs noise), and define response variables (measurable outputs). For screening many factors (8+), see [resources/methodology.md](resources/methodology.md#screening-designs) for Plackett-Burman and fractional factorial approaches. **Step 3: Choose experimental design** Based on objective and constraints: - **For screening 5+ factors with limited runs** → Use [resources/methodology.md](resources/methodology.md#screening-designs) for fractional factorial or Plackett-Burman - **For optimizing 2-5 factors** → Use [resources/template.md](resources/template.md#factorial-designs) for full or fractional factorial - **For response surface mapping** → Use [resources/methodology.md](resources/methodology.md#response-surface-methodology) for central composite or Box-Behnken - **For robust design against noise** → Use [resources/methodology.md](resources/methodology.md#taguchi-methods) for parameter vs noise factor arrays **Step 4: Plan execution details** Specify randomization order (eliminate time trends), blocking strategy (control nuisance variables), replication plan (estimate error), sample size justification (power analysis), and measurement protocols. See [Guardrails](#guardrails) for critical requirements. **Step 5: Create experiment plan document** Create `design-of-experiments.md` with sections: objective, factors table, design matrix (run order with factor settings), response variables, execution protocol, and analysis plan. Use [resources/template.md](resources/template.md) for structure. **Step 6: Validate quality** Self-assess using [resources/evaluators/rubric_design_of_experiments.json](resources/evaluators/rubric_design_of_experiments.json). Check: objective clarity, factor completeness, design appropriateness, randomization plan, measurement protocol, statistical power, analysis plan, and deliverable quality. **Minimum standard**: Average score ≥ 3.5 before delivering. ## Common Patterns **Pattern 1: Screening (many factors → vital few)** - **Context**: 10-30 candidate factors, limited budget, want to identify 3-5 critical factors - **Approach**: Plackett-Burman or fractional factorial (Resolution III/IV) - **Output**: Pareto chart of effect sizes, shortlist for follow-up optimization - **Example**: Software performance tuning with 15 configuration parameters **Pattern 2: Optimization (find best settings)** - **Context**: 2-5 factors already identified as important, want to find optimal levels - **Approach**: Full factorial (2^k) or fractional factorial + steepest ascent - **Output**: Main effects plot, interaction plots, recommended settings - **Example**: Manufacturing process with temperature, pressure, time factors **Pattern 3: Response Surface (map the landscape)** - **Context**: Need to understand curvature, find maximum/minimum, quantify tradeoffs - **Approach**: Central Composite Design (CCD) or Box-Behnken - **Output**: Response surface equation, contour plots, optimal region - **Example**: Chemical formulation with ingredient ratios **Pattern 4: Robust Design (work despite noise)** - **Context**: Product/process must perform well despite uncontrollable variation - **Approach**: Taguchi inner-outer array (control × noise factors) - **Output**: Settings that minimize sensitivity to noise factors - **Example**: Consumer product that must work across temperature/humidity ranges **Pattern 5: Sequential Experimentation (learn then refine)** - **Context**: High uncertainty, want to learn iteratively with minimal waste - **Approach**: Screening → Steepest ascent → Response surface → Confirmation - **Output**: Progressively refined understanding and settings - **Example**: New product development with unknown factor relationships ## Guardrails **Design requirements:** 1. **Randomize run order**: Eliminates time-order bias and confounding with lurking variables. Use random number generator, not "convenient" sequences. 2. **Replicate center points**: For designs with continuous factors, replicate center point runs (3-5 times) to estimate pure error and detect curvature. 3. **Preserve critical interactions**: In fractional factorials, avoid confounding important 2-way interactions with main effects. Choose Resolution IV or higher if interactions matter. 4. **Check design balance**: Ensure orthogonality (factors are uncorrelated in design matrix). Correlation > 0.3 reduces precision and interpretability. 5. **Define response precisely**: Use objective, quantitative, repeatable measurements. Avoid subjective scoring unless calibrated with multiple raters. 6. **Justify sample size**: Run power analysis to ensure design can detect meaningful effect sizes with acceptable Type II error risk (beta at most 0.20). 7. **Document assumptions**: State expected effect magnitudes, interaction assumptions, noise variance estimates. Design validity depends on these. 8. **Plan for analysis before running**: Specify statistical tests, significance level (alpha), effect size metrics before data collection to prevent p-hacking. **Common pitfalls:** - ❌ **One-factor-at-a-time (OFAT)**: Misses interactions, requires more runs than factorial designs - ❌ **Ignoring blocking**: If runs span days/batches/operators, block accordingly or confound results with time trends - ❌ **Too many levels**: Use 2-3 levels initially. More levels increase runs exponentially. - ❌ **Unmeasured factors**: If an important factor isn't controlled/measured, it becomes noise - ❌ **Changing protocols mid-experiment**: Breaks design structure. If necessary, restart or analyze separately. ## Quick Reference **Key resources:** - **[resources/template.md](resources/template.md)**: Quick-start templates for common designs (factorial, screening, response surface) - **[resources/methodology.md](resources/methodology.md)**: Advanced techniques (optimal designs, Taguchi, mixture experiments, sequential strategies) - **[resources/evaluators/rubric_design_of_experiments.json](resources/evaluators/rubric_design_of_experiments.json)**: Quality criteria for experiment plans **Typical workflow time:** - Simple factorial (2-4 factors): 15-30 minutes - Screening design (8+ factors): 30-45 minutes - Response surface design: 45-60 minutes - Robust design (Taguchi): 60-90 minutes **When to escalate:** - User needs mixture experiments (factors must sum to 100%) - Split-plot designs required (hard-to-change factors) - Optimal designs for irregular constraints - Bayesian adaptive designs → Use [resources/methodology.md](resources/methodology.md) for these advanced cases **Inputs required:** - **Process/System**: What you're experimenting on - **Factors**: List of controllable inputs with candidate levels - **Responses**: Measurable outputs (KPIs, metrics) - **Constraints**: Budget (max runs), time, resources - **Objective**: Screening, optimization, response surface, or robust design **Outputs produced:** - `design-of-experiments.md`: Complete experiment plan with design matrix, randomization, protocols, analysis approach
More from lyndonkl/claude
- abstraction-concrete-examplesBuilds structured abstraction ladders that translate high-level principles into concrete, actionable examples across 3-5 levels. Bridges communication gaps, reveals hidden assumptions, and tests whether abstract ideas work in practice. Use when explaining concepts at different expertise levels, moving between abstract principles and concrete implementation, identifying edge cases by testing ideas against scenarios, designing layered documentation, decomposing complex problems into actionable steps, or bridging strategy-execution gaps.
- academic-letter-architectGuides the creation of evidence-based academic recommendation letters, reference letters, and award nominations that combine concrete examples, meaningful comparisons, and genuine enthusiasm. Use when writing recommendation letters for students, postdocs, or colleagues, or when user mentions recommendation letter, reference, nomination, letter of support, endorsement, or needs help with strong advocacy and comparative statements.
- adr-architectureDocuments significant architectural and technical decisions with full context, alternatives considered, trade-offs analyzed, and consequences understood. Creates a decision trail that helps teams understand why decisions were made. Use when choosing between technology options, making infrastructure decisions, establishing standards, migrating systems, or when user mentions ADR, architecture decision, technical decision record, or decision documentation.
- adverse-selection-priorProduces a Bayesian prior probability that an offered transaction is +EV for the recipient, given that the counterparty chose to propose it. Applies Akerlof market-for-lemons logic -- if they offered it, they believe it is +EV for them, so the prior that it is +EV for us is materially below 50%. Reusable across trade evaluation, waiver drops (another team dropping a player is also adverse selection), job-offer analysis, M&A, and any "someone offered me this" situation. Use when you receive an unsolicited trade/offer/proposal, analyzing incoming trade prior, evaluating why a counterparty proposed a deal, or when user mentions adverse selection, market for lemons, why did they offer this, incoming trade prior, they proposed it, Bayesian adjustment on received offer.
- alignment-values-north-starCreates actionable alignment frameworks that give teams a shared North Star (direction), values (guardrails), and decision tenets (behavioral standards). Enables autonomous decision-making while maintaining organizational coherence. Use when starting new teams, scaling organizations, defining culture, establishing product vision, resolving misalignment, creating strategic clarity, or when user mentions North Star, team values, mission, principles, guardrails, decision framework, or cultural alignment.
- analogy-weight-checkFor every analogy in a substacker draft, verifies it carries mechanical weight — the analogy does real work explaining the mechanism, not merely decorates it. Cross-references analogy-catalog.md for novelty (is this analogy reused from a prior post?) and domain fit (biology > organizational > sports preferred; physics/military disfavored). Use whenever an analogy appears in the draft. Trigger keywords: analogy weight, decorative, mechanical weight, reused analogy, catalog check, metaphor check.
- answer-uncomfortable-questionTakes one strategic question about substacker ("should we launch paid?", "is this section dead?", "are we writing for the wrong audience?") and produces the mandatory evidence + reasoning + downside triad plus a recommendation. Used 3 times per Growth Strategist review. Trigger keywords: uncomfortable question, strategic question, evidence reasoning downside, triad.
- attribute-performanceFor each substacker post that materially over- or under-performs the rolling baseline (|z| ≥ 1.0), produces a plain-English attribution paragraph with calibrated confidence (high / medium / low / unexplained). Considers subject-line effect, topic zeitgeist, external share, day-of-week, length effect, and audience-notes signals. Labels unexplained outliers explicitly rather than fabricating a story. Use after compute-baseline when outlier posts exist. Trigger keywords: attribution, why did this post work, outlier explanation, performance analysis.
- auction-first-price-shadingComputes the optimal shaded bid for a first-price sealed-bid auction given a true private value, an estimate of the number of competing bidders N, and a value-distribution assumption. Implements the `(N-1)/N` equilibrium shading rule for uniform private values, adjusts for log-normal or empirical value distributions, layers a risk-aversion adjustment, and caps output against the bidder's remaining budget. Domain-neutral auction theory reusable across fantasy sports (baseball FAAB, NBA/NHL waiver auctions), prediction-market limit sizing, sealed procurement bids, and any blind-bid context. Use when user mentions "first-price auction bid", "sealed bid shading", "(N-1)/N", "FAAB bid amount", "auction shading", "optimal bid first-price", "bid for sealed-bid", "blind bid sizing", or when downstream logic needs a principled shade factor rather than an ad-hoc heuristic.
- auction-winners-curse-haircutApplies a Bayesian haircut to a bid valuation for common-value auctions where winning is itself evidence the bidder over-estimated. Takes a raw valuation, a value-type classification (common_value / private_value / mixed), the number of informed bidders N, and a signal-dispersion estimate, and returns an adjusted valuation. Domain-neutral and reusable across fantasy FAAB, prediction markets, M&A bids, ad-auction budgets, and any generic bidding context. Use when user mentions "winner's curse", "common value auction", "valuation haircut", "adverse valuation", "Bayesian bid adjustment", or "over-paying in auction".