result-validation-loop
$
npx mdskill add yogsoth-ai/de-anthropocentric-research-engine/result-validation-loopValidates results using statistical testing, ROPE, reproducibility, and synthesis
- Ensures experimental results are reliable and statistically meaningful
- Uses statistical-testing, result-collection, and reproducibility-verification tools
- Evaluates posterior probabilities against ROPE bounds and reproducibility criteria
- Returns structured validation outcomes with final judgment and synthesis
SKILL.md
.github/skills/result-validation-loopView on GitHub ↗
---
name: result-validation-loop
description: "Validate results through statistical testing, ROPE judgment, reproducibility re-runs, and final synthesis"
version: 1.0.0
category: experiment-execution
type: tactic
used-by: implementation-planning
orchestrates:
- result-collection
- statistical-testing
- reproducibility-verification
- execution-synthesis
---
# Tactic: Result Validation Loop
## Orchestration Pattern
```pseudocode
FUNCTION result_validation_loop(raw_results, experiment_design):
// Phase 1: Collect and structure
structured = SPAWN result-collection(raw_results)
VALIDATE structured.complete
// Phase 2: Statistical testing
stats = SPAWN statistical-testing(structured, experiment_design.hypotheses)
// Phase 3: ROPE Judgment
rope = experiment_design.rope // pre-registered ROPE bounds
IF stats.posterior_in_rope > 0.95:
judgment = "ACCEPT_NULL" // practically equivalent
ELIF stats.posterior_above_rope > 0.95:
judgment = "REJECT_NULL" // meaningful effect detected
ELSE:
judgment = "UNDECIDED" // need more data
END
// Phase 4: Reproducibility verification
IF judgment != "UNDECIDED":
repro = SPAWN reproducibility-verification(
experiment_design,
n_reruns = 3,
seeds = [42, 123, 7]
)
IF repro.icc < 0.5:
judgment = "NOT_REPRODUCIBLE"
ELIF repro.icc < 0.75:
judgment = judgment + "_PARTIAL_REPRO"
ELSE:
judgment = judgment + "_REPRODUCIBLE"
END
ELSE:
// Undecided — still run reproducibility to check if issue is noise
repro = SPAWN reproducibility-verification(
experiment_design,
n_reruns = 5, // more runs for undecided cases
seeds = [42, 123, 7, 256, 999]
)
IF repro.variance_explained_by_seed > 0.5:
judgment = "HIGH_VARIANCE_ACROSS_SEEDS"
END
END
// Phase 5: Synthesis
report = SPAWN execution-synthesis({
structured_results: structured,
statistical_tests: stats,
judgment: judgment,
reproducibility: repro,
experiment_design: experiment_design
})
RETURN report
END
```
## Decision Criteria
| Condition | Action |
|-----------|--------|
| Results incomplete (missing tasks) | Report gaps, analyze available data |
| P(in ROPE) > 95% | Accept null (no practical difference) |
| P(above ROPE) > 95% | Reject null (meaningful effect) |
| Neither threshold met | Undecided — recommend more data |
| ICC > 0.75 | Results reproducible |
| ICC 0.5-0.75 | Partially reproducible — flag |
| ICC < 0.5 | Not reproducible — investigate sources of variance |
| High seed-dependent variance | Report instability, recommend investigation |
## Quality Gates
Before producing final synthesis:
- [ ] All statistical tests report effect sizes (not just p-values)
- [ ] Confidence intervals are provided for all estimates
- [ ] ROPE was defined before analysis (not post-hoc)
- [ ] At least 3 reproducibility re-runs completed
- [ ] Limitations are explicitly stated
- [ ] Next steps are actionable
More from yogsoth-ai/de-anthropocentric-research-engine
- abductive-hypothesis-generationStrategy: 面对异常的最佳解释推理
- ablation-brainstormRemove components one by one, observe system changes to reveal hidden dependencies and generate ideas from structural gaps.
- ablation-component-mappingMap system architecture to ablatable units for ablation studies
- ablation-designDesign ablation studies to isolate component contributions in ML systems
- ablation-executionRemove components one by one from a system, record the response/impact of each removal.
- abp-vulnerability-classificationClassify assumptions on 2 axes — load-bearing (how much conclusion depends on it) × vulnerable (how likely to be false). Focuses attention on High-Load × High-Vulnerable quadrant.
- abstraction-extractionExtract abstract principles from concrete domain cases. Strips domain-specific details to reveal transferable mechanisms.
- abstraction-ladderPerform bisociation at multiple abstraction levels
- abstraction-ladderingMove between concrete and abstract framings — 3 levels up (Why?) and 3 levels down (How?) to find the most productive research level.
- abstraction-to-designAbstract biological principle to design principle. Bridge from biology to engineering.