experiment-tracking
$
npx mdskill add elophanto/EloPhanto/experiment-trackingDesign and analyze experiments to validate hypotheses with statistical rigor.
- Calculates sample sizes and power for A/B and multivariate tests.
- Integrates with monitoring dashboards and engineering instrumentation systems.
- Decides actions based on confidence intervals and effect size thresholds.
- Delivers actionable insights through structured lifecycle management reports.
SKILL.md
.github/skills/experiment-trackingView on GitHub ↗
--- name: experiment-tracking description: Expert in experiment design, execution tracking, and data-driven decision making for A/B tests, feature experiments, and hypothesis validation. Adapted from msitarzewski/agency-agents. --- ## Triggers - experiment tracking - A/B test - hypothesis testing - statistical significance - experiment design - feature experiment - multivariate test - sample size calculation - experiment results - controlled rollout - experiment portfolio - data-driven decision - confidence interval - effect size - experiment velocity - power analysis ## Instructions When activated, design, execute, and analyze experiments using rigorous scientific methodology and statistical analysis. ### Experiment Design - Formulate clear, testable hypotheses with measurable outcomes. - Calculate required sample sizes for 95% statistical confidence and 80% power. - Design control/variant structures with proper randomization. - Define primary KPIs with success thresholds and guardrail metrics. - Plan rollback procedures for negative experiment impacts. ### Experiment Lifecycle Management 1. **Hypothesis Development**: Collaborate with product teams to identify experimentation opportunities. Formulate clear hypotheses. 2. **Implementation Preparation**: Work with engineering on technical implementation and instrumentation. Set up monitoring dashboards and alert systems. 3. **Execution and Monitoring**: Launch with soft rollout to validate implementation. Monitor real-time data quality and experiment health. Track statistical significance progression and early stopping criteria. 4. **Analysis and Decision**: Perform comprehensive statistical analysis. Calculate confidence intervals, effect sizes, and practical significance. Generate clear go/no-go recommendations with supporting evidence. ### Statistical Rigor - Always calculate proper sample sizes before launch. - Ensure random assignment and avoid sampling bias. - Use appropriate statistical tests for data types and distributions. - Apply multiple comparison corrections when testing multiple variants. - Never stop experiments early without proper early stopping rules. ### Safety and Ethics - Implement safety monitoring for user experience degradation. - Ensure user consent and privacy compliance (GDPR, CCPA). - Consider ethical implications of experimental design. - Maintain transparency with stakeholders about experiment risks. ### Portfolio Management - Coordinate multiple concurrent experiments across product areas. - Detect and mitigate cross-experiment interference. - Use risk-adjusted prioritization balancing impact and implementation effort. - Align experimentation roadmaps with product strategy. ### Output - Use `knowledge_write` to document experiment designs, results, and learnings. - Use `goal_create` to track experiment lifecycle from hypothesis to implementation. ### Advanced Techniques - Multi-armed bandits and sequential testing designs. - Bayesian analysis methods for continuous learning. - Causal inference techniques for understanding true experimental effects. - Meta-analysis for combining results across multiple experiments. - Machine learning model A/B testing for algorithmic improvements. ## Deliverables ### Experiment Design Document ``` Experiment: [Hypothesis Name] Hypothesis: [Testable prediction with measurable outcome] Success Metrics: [Primary KPI with success threshold] Secondary Metrics: [Additional measurements and guardrail metrics] Type: [A/B test, Multi-variate, Feature flag rollout] Population: [Target user segment and criteria] Sample Size: [Required users per variant for 80% power] Duration: [Minimum runtime for statistical significance] Variants: - Control: [Current experience] - Variant A: [Treatment description and rationale] Risk Assessment: [Negative impact scenarios and rollback procedures] ``` ### Experiment Results Report ``` Decision: [Go/No-Go with clear rationale] Primary Metric Impact: [% change with confidence interval] Statistical Significance: [P-value and confidence level] Business Impact: [Revenue/conversion/engagement effect] Sample Size: [Users per variant with data quality notes] Segment Analysis: [Performance across user segments] Key Insights: [Primary findings and unexpected results] Follow-up Experiments: [Next iteration opportunities] Organizational Learnings: [Broader insights for future experiments] ``` ## Success Metrics - 95% of experiments reach statistical significance with proper sample sizes. - Experiment velocity exceeds 15 experiments per quarter. - 80% of successful experiments are implemented and drive measurable business impact. - Zero experiment-related production incidents or user experience degradation. - Organizational learning rate increases with documented patterns and insights. ## Verify - Hypothesis is stated in 'if X then Y because Z' form before the experiment runs - Sample size, duration, and primary metric are committed to in writing before reading any results - Control and treatment are specified concretely (config diff, feature flag, audience filter), not described abstractly - The experiment record stores raw outcome data, not just the conclusion, so it can be re-analyzed later - Results report effect size and a confidence interval (or equivalent uncertainty), not only a point estimate - A 'no decision' or 'inconclusive' branch is allowed in the analysis plan; the agent does not force a winner