red-teaming
$
npx mdskill add yogsoth-ai/de-anthropocentric-research-engine/red-teamingCore question: **Can systematic adversarial attacks find fatal flaws in this artifact?**
SKILL.md
.github/skills/red-teamingView on GitHub ↗
--- name: red-teaming description: "Campaign: Systematic adversarial attack from military/intelligence/AI-safety traditions. Core question: Can systematic adversarial attacks find fatal flaws? Methods: UFMCS Red Team Handbook v9.0, CIA SAT, Anthropic Red Teaming, NIST AI RMF, Inie et al. 12-strategy taxonomy." type: campaign produces: RedTeamReport artifact-types: [gap, hypothesis, research-question, idea, approach, experiment-design, claim] --- # Red Teaming Campaign Core question: **Can systematic adversarial attacks find fatal flaws in this artifact?** ## Methodology Sources - UFMCS Red Team Handbook v9.0 — Military structured analytic techniques - CIA Structured Analytic Techniques (SAT) — Key Assumptions Check, Devil's Advocacy - Anthropic Red Teaming (2022) — AI-safety systematic probing methodology - NIST AI Risk Management Framework — Threat surface enumeration - Inie et al. (2024) — 12-strategy taxonomy of adversarial attacks ## Strategy Routing | Artifact Type | Primary Strategy | Fallback Strategy | |---|---|---| | hypothesis, claim | assumption-challenge | adversarial-persona | | research-question | alternative-analysis | groupthink-mitigation | | idea, approach | systematic-probing | assumption-challenge | | experiment-design | systematic-probing | alternative-analysis | | gap | adversarial-persona | groupthink-mitigation | ## Budget Table | Parameter | S (Quick) | M (Standard) | L (Deep) | |---|---|---|---| | Attack vectors | 5 | 12 | 20 | | Probing rounds | 3 | 6 | 10 | | Personas | 2 | 4 | 6 | | Assumption checks | 5 | 10 | 20 | ## Tactics - **structured-attack-campaign** — Threat surface enumeration, vector generation, systematic probing, aggregation - **assumption-cascade** — Surface assumptions, dependency sort, root attack, cascade trace - **adversarial-roleplay** — Construct hostile persona, attack from persona perspective, record paths ## Context Management Each subagent operates in isolated adversarial context. Persona contamination is prevented by spawning separate agents per attack role. Findings are aggregated only after all probing rounds complete. Attack vectors are deduplicated before scoring. ## Output Produces `RedTeamReport` containing: threat surface map, attack results by vector, assumption cascade analysis, resilience score (0.0-1.0), critical vulnerabilities, and recommended hardening actions.
More from yogsoth-ai/de-anthropocentric-research-engine
- abductive-hypothesis-generationStrategy: 面对异常的最佳解释推理
- ablation-brainstormRemove components one by one, observe system changes to reveal hidden dependencies and generate ideas from structural gaps.
- ablation-component-mappingMap system architecture to ablatable units for ablation studies
- ablation-designDesign ablation studies to isolate component contributions in ML systems
- ablation-executionRemove components one by one from a system, record the response/impact of each removal.
- abp-vulnerability-classificationClassify assumptions on 2 axes — load-bearing (how much conclusion depends on it) × vulnerable (how likely to be false). Focuses attention on High-Load × High-Vulnerable quadrant.
- abstraction-extractionExtract abstract principles from concrete domain cases. Strips domain-specific details to reveal transferable mechanisms.
- abstraction-ladderPerform bisociation at multiple abstraction levels
- abstraction-ladderingMove between concrete and abstract framings — 3 levels up (Why?) and 3 levels down (How?) to find the most productive research level.
- abstraction-to-designAbstract biological principle to design principle. Bridge from biology to engineering.