threshold-calibration
$
npx mdskill add yogsoth-ai/de-anthropocentric-research-engine/threshold-calibrationCalibrate consensus thresholds to identify robust and fragile consensus items
- Solves the problem of arbitrary threshold selection in consensus analysis
- Uses threshold-sweep, consensus-classification, and consensus-measurement SOPs
- Analyzes threshold curves to detect classification stability and knee points
- Delivers a threshold-consensus curve with item classification at chosen thresholds
SKILL.md
.github/skills/threshold-calibrationView on GitHub ↗
--- name: threshold-calibration description: Systematically sweep consensus thresholds to observe which items achieve consensus at what level, producing a threshold-consensus curve. execution: tactic used-by: structured-consensus --- # Threshold Calibration Systematically vary the consensus threshold to understand the sensitivity of consensus classification. Rather than picking a single arbitrary threshold, sweep across a range to see which items are robust consensus (agree at any threshold) vs. fragile (only consensus at lenient thresholds). ## Stages 1. **Sweep** — Run `threshold-sweep` to compute consensus status at multiple threshold levels 2. **Classify** — Run `consensus-classification` to categorize items at the chosen operating threshold 3. **Measure** — Run `consensus-measurement` to validate final consensus scores ## Available SOPs | SOP | Role in Tactic | |-----|---------------| | threshold-sweep | Compute consensus at multiple threshold levels, produce curve | | consensus-classification | Classify items as consensus/dissensus at operating threshold | | consensus-measurement | Validate final consensus scores with appropriate method | ## Execution Guidance - Sweep range should cover 50%–90% agreement (or IQR 0.5–2.0) - Identify "knee" in the curve where many items flip classification - Robust consensus items (agree at strict thresholds) are highest confidence - Fragile items (only consensus at lenient thresholds) need flagging - Report both the curve and the classification at the chosen operating point ## Minimum Yield - Threshold-consensus curve (threshold vs. number-of-consensus-items curve) - Classification results (classification at operating threshold: consensus items, dissensus items)
More from yogsoth-ai/de-anthropocentric-research-engine
- abductive-hypothesis-generationStrategy: 面对异常的最佳解释推理
- ablation-brainstormRemove components one by one, observe system changes to reveal hidden dependencies and generate ideas from structural gaps.
- ablation-component-mappingMap system architecture to ablatable units for ablation studies
- ablation-designDesign ablation studies to isolate component contributions in ML systems
- ablation-executionRemove components one by one from a system, record the response/impact of each removal.
- abp-vulnerability-classificationClassify assumptions on 2 axes — load-bearing (how much conclusion depends on it) × vulnerable (how likely to be false). Focuses attention on High-Load × High-Vulnerable quadrant.
- abstraction-extractionExtract abstract principles from concrete domain cases. Strips domain-specific details to reveal transferable mechanisms.
- abstraction-ladderPerform bisociation at multiple abstraction levels
- abstraction-ladderingMove between concrete and abstract framings — 3 levels up (Why?) and 3 levels down (How?) to find the most productive research level.
- abstraction-to-designAbstract biological principle to design principle. Bridge from biology to engineering.