robustness-design

$npx mdskill add yogsoth-ai/de-anthropocentric-research-engine/robustness-design

**Question**: Under what conditions does the method fail?

SKILL.md
.github/skills/robustness-designView on GitHub ↗
---
name: robustness-design
description: "Design experiments to identify failure boundaries and robustness limits"
version: 1.0.0
category: experiment-execution
type: strategy
used-by: experiment-design
sops:
  - factor-identification
  - level-specification
  - baseline-selection
  - metric-specification
  - sample-size-estimation
  - design-matrix-construction
tactics:
  - statistical-method-selection
---

# Strategy: Robustness Design

**Question**: Under what conditions does the method fail?

## Methodology

- **Distribution Shift Testing**: Evaluate under covariate shift, label shift, domain shift.
- **Adversarial Robustness**: Perturbation-based attacks (PGD, AutoAttack) at varying epsilon.
- **Cross-Domain Transfer**: Test on domains not seen during training.
- **Noise Injection**: Gaussian noise, label noise, missing data at varying severity.
- **Stress Testing**: Push inputs to boundary conditions (extreme lengths, rare categories, edge cases).

## Execution Flow

1. **factor-identification** → Identify robustness dimensions (noise type, shift type, severity)
2. **level-specification** → Define severity levels for each perturbation
3. **baseline-selection** → Select robust baselines for comparison
4. **metric-specification** → Define degradation metrics (absolute and relative to clean)
5. **design-matrix-construction** → Build perturbation grid
6. **sample-size-estimation** → Determine samples needed per condition
7. **statistical-method-selection** (tactic) → Choose tests for degradation significance

## Budget Gate

| Robustness Type | Conditions | Severities | Min Runs | Notes |
|----------------|-----------|-----------|----------|-------|
| Single perturbation | 1 | 3-5 | 3-5 | Quick sanity check |
| Multi-perturbation | 3-5 | 3 each | 9-15 | Standard robustness eval |
| Adversarial sweep | 1 attack | 5-10 epsilon | 5-10 | Adversarial robustness curve |
| Comprehensive | 5+ types | 3-5 each | 50+ | Publication-ready robustness |
| Cross-domain | N domains | 1 | N | Transfer evaluation |
More from yogsoth-ai/de-anthropocentric-research-engine