comparison-design
$
npx mdskill add yogsoth-ai/de-anthropocentric-research-engine/comparison-design**Question**: How much better is our method than the baseline?
SKILL.md
.github/skills/comparison-designView on GitHub ↗
--- name: comparison-design description: "Design fair comparison experiments against baselines and competing methods" version: 1.0.0 category: experiment-execution type: strategy used-by: experiment-design sops: - baseline-selection - metric-specification - sample-size-estimation - seed-protocol-design - environment-specification tactics: - statistical-method-selection - reproducibility-protocol --- # Strategy: Comparison Design **Question**: How much better is our method than the baseline? ## Methodology - **Fair Comparison Protocol** (Bouthillier 2021): Control all confounds, same compute budget, same tuning effort. - **Multi-Baseline Comparison**: Compare against multiple baselines (SOTA, simple, ablated). - **Multi-Dataset Evaluation**: Test across diverse datasets to avoid dataset-specific overfitting. - **Bayesian Comparison** (Benavoli 2017): Posterior probability of superiority, not just p-values. - **Bootstrap/Permutation Tests**: Non-parametric significance without distributional assumptions. ## Execution Flow 1. **baseline-selection** → Select appropriate baselines (SOTA, simple, oracle) 2. **metric-specification** → Define primary metric and secondary metrics 3. **sample-size-estimation** → Power analysis for detecting meaningful differences 4. **seed-protocol-design** → Ensure fair random initialization across methods 5. **environment-specification** → Lock environment to prevent confounds 6. **reproducibility-protocol** (tactic) → Ensure all results are reproducible 7. **statistical-method-selection** (tactic) → Choose Bayesian or frequentist comparison ## Budget Gate | Comparison Scope | Baselines | Datasets | Seeds | Min Runs | |-----------------|-----------|----------|-------|----------| | Minimal | 1 SOTA + 1 simple | 1 | 3 | 6 | | Standard | 2-3 baselines | 2-3 | 5 | 30-45 | | Comprehensive | 4+ baselines | 3-5 | 5-10 | 100+ | | Publication-ready | All relevant | 5+ | 10+ | 200+ |
More from yogsoth-ai/de-anthropocentric-research-engine
- abductive-hypothesis-generationStrategy: 面对异常的最佳解释推理
- ablation-brainstormRemove components one by one, observe system changes to reveal hidden dependencies and generate ideas from structural gaps.
- ablation-component-mappingMap system architecture to ablatable units for ablation studies
- ablation-designDesign ablation studies to isolate component contributions in ML systems
- ablation-executionRemove components one by one from a system, record the response/impact of each removal.
- abp-vulnerability-classificationClassify assumptions on 2 axes — load-bearing (how much conclusion depends on it) × vulnerable (how likely to be false). Focuses attention on High-Load × High-Vulnerable quadrant.
- abstraction-extractionExtract abstract principles from concrete domain cases. Strips domain-specific details to reveal transferable mechanisms.
- abstraction-ladderPerform bisociation at multiple abstraction levels
- abstraction-ladderingMove between concrete and abstract framings — 3 levels up (Why?) and 3 levels down (How?) to find the most productive research level.
- abstraction-to-designAbstract biological principle to design principle. Bridge from biology to engineering.