robustness-design
$
npx mdskill add yogsoth-ai/de-anthropocentric-research-engine/robustness-design**Question**: Under what conditions does the method fail?
SKILL.md
.github/skills/robustness-designView on GitHub ↗
--- name: robustness-design description: "Design experiments to identify failure boundaries and robustness limits" version: 1.0.0 category: experiment-execution type: strategy used-by: experiment-design sops: - factor-identification - level-specification - baseline-selection - metric-specification - sample-size-estimation - design-matrix-construction tactics: - statistical-method-selection --- # Strategy: Robustness Design **Question**: Under what conditions does the method fail? ## Methodology - **Distribution Shift Testing**: Evaluate under covariate shift, label shift, domain shift. - **Adversarial Robustness**: Perturbation-based attacks (PGD, AutoAttack) at varying epsilon. - **Cross-Domain Transfer**: Test on domains not seen during training. - **Noise Injection**: Gaussian noise, label noise, missing data at varying severity. - **Stress Testing**: Push inputs to boundary conditions (extreme lengths, rare categories, edge cases). ## Execution Flow 1. **factor-identification** → Identify robustness dimensions (noise type, shift type, severity) 2. **level-specification** → Define severity levels for each perturbation 3. **baseline-selection** → Select robust baselines for comparison 4. **metric-specification** → Define degradation metrics (absolute and relative to clean) 5. **design-matrix-construction** → Build perturbation grid 6. **sample-size-estimation** → Determine samples needed per condition 7. **statistical-method-selection** (tactic) → Choose tests for degradation significance ## Budget Gate | Robustness Type | Conditions | Severities | Min Runs | Notes | |----------------|-----------|-----------|----------|-------| | Single perturbation | 1 | 3-5 | 3-5 | Quick sanity check | | Multi-perturbation | 3-5 | 3 each | 9-15 | Standard robustness eval | | Adversarial sweep | 1 attack | 5-10 epsilon | 5-10 | Adversarial robustness curve | | Comprehensive | 5+ types | 3-5 each | 50+ | Publication-ready robustness | | Cross-domain | N domains | 1 | N | Transfer evaluation |
More from yogsoth-ai/de-anthropocentric-research-engine
- abductive-hypothesis-generationStrategy: 面对异常的最佳解释推理
- ablation-brainstormRemove components one by one, observe system changes to reveal hidden dependencies and generate ideas from structural gaps.
- ablation-component-mappingMap system architecture to ablatable units for ablation studies
- ablation-designDesign ablation studies to isolate component contributions in ML systems
- ablation-executionRemove components one by one from a system, record the response/impact of each removal.
- abp-vulnerability-classificationClassify assumptions on 2 axes — load-bearing (how much conclusion depends on it) × vulnerable (how likely to be false). Focuses attention on High-Load × High-Vulnerable quadrant.
- abstraction-extractionExtract abstract principles from concrete domain cases. Strips domain-specific details to reveal transferable mechanisms.
- abstraction-ladderPerform bisociation at multiple abstraction levels
- abstraction-ladderingMove between concrete and abstract framings — 3 levels up (Why?) and 3 levels down (How?) to find the most productive research level.
- abstraction-to-designAbstract biological principle to design principle. Bridge from biology to engineering.