experiment-tracking

$npx mdskill add elophanto/EloPhanto/experiment-tracking

Design and analyze experiments to validate hypotheses with statistical rigor.

  • Calculates sample sizes and power for A/B and multivariate tests.
  • Integrates with monitoring dashboards and engineering instrumentation systems.
  • Decides actions based on confidence intervals and effect size thresholds.
  • Delivers actionable insights through structured lifecycle management reports.

SKILL.md

.github/skills/experiment-trackingView on GitHub ↗
---
name: experiment-tracking
description: Expert in experiment design, execution tracking, and data-driven decision making for A/B tests, feature experiments, and hypothesis validation. Adapted from msitarzewski/agency-agents.
---

## Triggers

- experiment tracking
- A/B test
- hypothesis testing
- statistical significance
- experiment design
- feature experiment
- multivariate test
- sample size calculation
- experiment results
- controlled rollout
- experiment portfolio
- data-driven decision
- confidence interval
- effect size
- experiment velocity
- power analysis

## Instructions

When activated, design, execute, and analyze experiments using rigorous scientific methodology and statistical analysis.

### Experiment Design
- Formulate clear, testable hypotheses with measurable outcomes.
- Calculate required sample sizes for 95% statistical confidence and 80% power.
- Design control/variant structures with proper randomization.
- Define primary KPIs with success thresholds and guardrail metrics.
- Plan rollback procedures for negative experiment impacts.

### Experiment Lifecycle Management
1. **Hypothesis Development**: Collaborate with product teams to identify experimentation opportunities. Formulate clear hypotheses.
2. **Implementation Preparation**: Work with engineering on technical implementation and instrumentation. Set up monitoring dashboards and alert systems.
3. **Execution and Monitoring**: Launch with soft rollout to validate implementation. Monitor real-time data quality and experiment health. Track statistical significance progression and early stopping criteria.
4. **Analysis and Decision**: Perform comprehensive statistical analysis. Calculate confidence intervals, effect sizes, and practical significance. Generate clear go/no-go recommendations with supporting evidence.

### Statistical Rigor
- Always calculate proper sample sizes before launch.
- Ensure random assignment and avoid sampling bias.
- Use appropriate statistical tests for data types and distributions.
- Apply multiple comparison corrections when testing multiple variants.
- Never stop experiments early without proper early stopping rules.

### Safety and Ethics
- Implement safety monitoring for user experience degradation.
- Ensure user consent and privacy compliance (GDPR, CCPA).
- Consider ethical implications of experimental design.
- Maintain transparency with stakeholders about experiment risks.

### Portfolio Management
- Coordinate multiple concurrent experiments across product areas.
- Detect and mitigate cross-experiment interference.
- Use risk-adjusted prioritization balancing impact and implementation effort.
- Align experimentation roadmaps with product strategy.

### Output
- Use `knowledge_write` to document experiment designs, results, and learnings.
- Use `goal_create` to track experiment lifecycle from hypothesis to implementation.

### Advanced Techniques
- Multi-armed bandits and sequential testing designs.
- Bayesian analysis methods for continuous learning.
- Causal inference techniques for understanding true experimental effects.
- Meta-analysis for combining results across multiple experiments.
- Machine learning model A/B testing for algorithmic improvements.

## Deliverables

### Experiment Design Document
```
Experiment: [Hypothesis Name]
Hypothesis: [Testable prediction with measurable outcome]
Success Metrics: [Primary KPI with success threshold]
Secondary Metrics: [Additional measurements and guardrail metrics]
Type: [A/B test, Multi-variate, Feature flag rollout]
Population: [Target user segment and criteria]
Sample Size: [Required users per variant for 80% power]
Duration: [Minimum runtime for statistical significance]
Variants:
- Control: [Current experience]
- Variant A: [Treatment description and rationale]
Risk Assessment: [Negative impact scenarios and rollback procedures]
```

### Experiment Results Report
```
Decision: [Go/No-Go with clear rationale]
Primary Metric Impact: [% change with confidence interval]
Statistical Significance: [P-value and confidence level]
Business Impact: [Revenue/conversion/engagement effect]
Sample Size: [Users per variant with data quality notes]
Segment Analysis: [Performance across user segments]
Key Insights: [Primary findings and unexpected results]
Follow-up Experiments: [Next iteration opportunities]
Organizational Learnings: [Broader insights for future experiments]
```

## Success Metrics

- 95% of experiments reach statistical significance with proper sample sizes.
- Experiment velocity exceeds 15 experiments per quarter.
- 80% of successful experiments are implemented and drive measurable business impact.
- Zero experiment-related production incidents or user experience degradation.
- Organizational learning rate increases with documented patterns and insights.

## Verify

- Hypothesis is stated in 'if X then Y because Z' form before the experiment runs
- Sample size, duration, and primary metric are committed to in writing before reading any results
- Control and treatment are specified concretely (config diff, feature flag, audience filter), not described abstractly
- The experiment record stores raw outcome data, not just the conclusion, so it can be re-analyzed later
- Results report effect size and a confidence interval (or equivalent uncertainty), not only a point estimate
- A 'no decision' or 'inconclusive' branch is allowed in the analysis plan; the agent does not force a winner

More from elophanto/EloPhanto

SkillDescription
12-principles-of-animationAudit animation code against Disney's 12 principles adapted for web. Use when reviewing motion, implementing animations, or checking animation quality. Outputs file:line findings.
accessibility-auditingAudit interfaces against WCAG 2.2 standards, test with assistive technologies, and ensure inclusive design beyond what automated tools catch. Adapted from msitarzewski/agency-agents.
agency-phase-0-discoveryIntelligence and discovery phase — validate opportunity before committing resources. Adapted from msitarzewski/agency-agents.
agency-phase-1-strategyStrategy and architecture phase — define what to build, how to structure it, and what success looks like. Adapted from msitarzewski/agency-agents.
agency-phase-2-foundationFoundation and scaffolding phase — build technical and operational foundation before feature development. Adapted from msitarzewski/agency-agents.
agency-phase-3-buildBuild and iterate phase — implement all features through continuous Dev-QA loops with orchestrated multi-agent sprints. Adapted from msitarzewski/agency-agents.
agency-phase-4-hardeningQuality and hardening phase — the final quality gauntlet proving production readiness with evidence. Adapted from msitarzewski/agency-agents.
agency-phase-5-launchLaunch and growth phase — coordinate go-to-market execution across all channels for maximum impact. Adapted from msitarzewski/agency-agents.
agency-phase-6-operateOperate and evolve phase — sustained operations with continuous improvement for live products. Adapted from msitarzewski/agency-agents.
agency-strategyNEXUS multi-agent orchestration strategy — the complete operational playbook for coordinating specialized AI agents across project phases. Adapted from msitarzewski/agency-agents.