result-analysis
$
npx mdskill add yogsoth-ai/de-anthropocentric-research-engine/result-analysisAnalyzes results using statistical methods and synthesizes findings
- Validates experiment results for statistical significance and reproducibility
- Uses bootstrap, permutation tests, Bayesian methods, and effect size metrics
- Evaluates results through multiple analytical layers and re-runs for consistency
- Produces a report with confidence levels and actionable conclusions
SKILL.md
.github/skills/result-analysisView on GitHub ↗
---
name: result-analysis
description: "Statistically analyze collected results, verify reproducibility, and synthesize findings"
version: 1.0.0
category: experiment-execution
type: strategy
used-by: implementation-planning
sops:
- statistical-testing
- reproducibility-verification
- execution-synthesis
tactics:
- result-validation-loop
---
# Strategy: Result Analysis
**Key Question**: What do the results tell us?
## Methodology
Three-layer analysis combining frequentist, resampling, and Bayesian approaches:
1. **Statistical Testing** — Bootstrap CI, Permutation tests, Bayesian ROPE judgment
2. **Effect Size Calculation** — Cohen's d, Cliff's delta, or domain-appropriate measure
3. **Reproducibility Verification** — Re-run with different seeds, compare distributions
4. **Synthesis** — Integrate findings into actionable conclusions
## Execution Flow
```
[Collected results from experiment-running]
→ statistical-testing (bootstrap/permutation/Bayesian)
→ effect size calculation
→ reproducibility-verification (re-run, compare)
→ execution-synthesis (comprehensive report)
→ OUTPUT: validated findings with confidence levels
```
## Budget Gate
| Step | Max Budget | Output |
|------|-----------|--------|
| Statistical testing | 8% | Test results with p-values/CIs |
| Reproducibility | 8% | Re-run comparison |
| Synthesis | 4% | Final report |
## Key Decisions
- **Test selection**:
- Known distribution → parametric (t-test, ANOVA)
- Unknown/non-normal → bootstrap CI or permutation test
- Need practical significance → Bayesian ROPE
- **Reproducibility threshold**: Results must agree within 1 SE across re-runs
- **Effect size interpretation**:
- Small: d < 0.2 (may not be practically significant)
- Medium: 0.2 ≤ d < 0.8 (likely meaningful)
- Large: d ≥ 0.8 (strong effect)
- **ROPE (Region of Practical Equivalence)**: Define before testing, not after
## Integration with Knowledge System
Results feed back into:
- Wiki vault (claims with evidence)
- Future experiment design (what worked, what didn't)
- North star progress tracking
More from yogsoth-ai/de-anthropocentric-research-engine
- abductive-hypothesis-generationStrategy: 面对异常的最佳解释推理
- ablation-brainstormRemove components one by one, observe system changes to reveal hidden dependencies and generate ideas from structural gaps.
- ablation-component-mappingMap system architecture to ablatable units for ablation studies
- ablation-designDesign ablation studies to isolate component contributions in ML systems
- ablation-executionRemove components one by one from a system, record the response/impact of each removal.
- abp-vulnerability-classificationClassify assumptions on 2 axes — load-bearing (how much conclusion depends on it) × vulnerable (how likely to be false). Focuses attention on High-Load × High-Vulnerable quadrant.
- abstraction-extractionExtract abstract principles from concrete domain cases. Strips domain-specific details to reveal transferable mechanisms.
- abstraction-ladderPerform bisociation at multiple abstraction levels
- abstraction-ladderingMove between concrete and abstract framings — 3 levels up (Why?) and 3 levels down (How?) to find the most productive research level.
- abstraction-to-designAbstract biological principle to design principle. Bridge from biology to engineering.