cross-eval
$
npx mdskill add alirezarezvani/claude-skills/cross-evalRuns multi-model consensus on high-stakes business memos
- Checks board memos for strategic clarity and risk before irreversible decisions
- Uses Claude, Codex, and Gemini models with graceful degradation
- Reconciles model outputs to identify consensus and flag divergences
- Returns a unified evaluation with model-specific insights and warnings
SKILL.md
.github/skills/cross-evalView on GitHub ↗
---
name: "cross-eval"
description: "/cs:cross-eval <memo> — Multi-model consensus on a board memo or strategy brief. Claude + Codex + Gemini cross-review with graceful degradation. Use when a high-stakes memo needs an independent sanity check before the boardroom — e.g. a bet-the-company pivot or fundraise terms."
---
# /cs:cross-eval — Multi-Model Consensus
**Command:** `/cs:cross-eval <memo-or-brief>`
Runs the same memo through multiple model providers and reconciles divergences. Use for **high-stakes, irreversible decisions** where single-model bias is too costly: M&A, major fundraises, layoffs, strategic pivots, regulatory commitments.
Adapted from gstack's `/codex` cross-review pattern, generalized to **business memos** instead of code PRs.
## When to Run
- Before signing a term sheet
- Before announcing a layoff
- Before committing to a regulated market
- Before any decision where reversing costs > 6 months of company time
- When the boardroom vote was split or had a CRITICAL dissent
## Models Used (graceful degradation)
The command tries to invoke each available model in order:
1. **Claude** (primary, always available) — the boardroom's native voice
2. **Codex / OpenAI** (if `OPENAI_API_KEY` or `codex` CLI available)
3. **Gemini** (if `GEMINI_API_KEY` or `gemini` CLI available)
If only Claude is available, the command runs **Claude-only with adversarial mode** — same model, different prompt seeds — and clearly labels the output as single-model.
## Workflow
1. Read the memo / brief
2. Probe environment for available model CLIs / API keys
3. For each available model:
- Send the memo with this prompt prefix:
> "You are an independent C-suite reviewer. The following is a board memo from another company's boardroom. Identify the top 3 concerns, the top 3 supports, and your vote (APPROVE / REJECT / DEFER). Do not deferentially agree — assume the memo's reasoning is flawed until proven otherwise."
4. Collect three independent reviews
5. Reconcile: where do they agree? Where do they diverge?
6. Surface the divergences as questions for the founder
## Output Format
Saved to `~/.claude/cross-eval/YYYY-MM-DD-<slug>.md`:
```markdown
# Cross-Eval: <memo title>
**Date:** YYYY-MM-DD
**Memo reviewed:** <link>
**Models invoked:** Claude / Codex / Gemini (or noted fallbacks)
## Vote Tally
| Model | Vote | Confidence |
|---|---|---|
| Claude | APPROVE | High |
| Codex | DEFER | Med |
| Gemini | APPROVE | Low |
## Consensus Concerns (≥2 models flagged)
1. <concern> — flagged by Claude + Codex
2. <concern> — flagged by all 3
## Divergent Concerns (1 model flagged)
- <Codex only:> <concern> — worth a second look
- <Gemini only:> <concern> — likely noise, but check
## Consensus Supports (≥2 models endorsed)
1. <support>
2. <support>
## Recommendation
- 🟢 GO if 2+ models APPROVE and no CRITICAL concerns from any model
- 🟡 PAUSE if any model is DEFER or any concern is CRITICAL
- 🔴 STOP if 2+ models REJECT
## Open Questions for Founder
1. <question raised by divergence>
2. <question raised by divergence>
```
## Why This Matters
Single-model recommendations have systematic biases. Claude trends helpful and may under-weight risk. Codex (OpenAI) trends more cautious on emerging-market and regulatory topics. Gemini trends more cautious on technical scale claims. Disagreement is signal, not noise.
This is the **safety net before irreversibility** — not a replacement for outside counsel or a real board.
## Graceful Degradation
If only Claude is available:
```markdown
**Models available:** Claude only
**Mode:** ADVERSARIAL — running 3 independent Claude passes with different system prompts:
1. Standard reviewer
2. Devil's advocate (must find 3 critical concerns)
3. Steelman (must find 3 strongest reasons to approve)
This is weaker than true multi-model. Treat the result as suggestive, not conclusive.
```
## Routing
- `/cs:decide` — if consensus is GO
- `/cs:freeze` — if consensus is PAUSE
- `/cs:boardroom` (re-run) — if consensus is STOP
## Related
- Skills: [`board-meeting`](../../../skills/board-meeting/SKILL.md), [`executive-mentor`](../../../executive-mentor/)
- Inspiration: gstack's `/codex` cross-review pattern (adapted to business memos)
---
**Version:** 1.0.0
More from alirezarezvani/claude-skills
- a11y-auditAccessibility audit skill for scanning, fixing, and verifying WCAG 2.2 Level A and AA compliance across React, Next.js, Vue, Angular, Svelte, and plain HTML codebases. Use when auditing accessibility, fixing a11y violations, checking color contrast, generating compliance reports, or integrating accessibility checks into CI/CD pipelines.
- ab-test-setupWhen the user wants to plan, design, or implement an A/B test or experiment. Also use when the user mentions "A/B test," "split test," "experiment," "test this change," "variant copy," "multivariate test," "hypothesis," "conversion experiment," "statistical significance," or "test this." For tracking implementation, see analytics-tracking.
- ad-creativeWhen the user needs to generate, iterate, or scale ad creative for paid advertising. Use when they say 'write ad copy,' 'generate headlines,' 'create ad variations,' 'bulk creative,' 'iterate on ads,' 'ad copy validation,' 'RSA headlines,' 'Meta ad copy,' 'LinkedIn ad,' or 'creative testing.' This is pure creative production — distinct from paid-ads (campaign strategy). Use ad-creative when you need the copy, not the campaign plan.
- adversarial-reviewerAdversarial code review that breaks the self-review monoculture. Use when you want a genuinely critical review of recent changes, before merging a PR, or when you suspect Claude is being too agreeable about code quality. Forces perspective shifts through hostile reviewer personas that catch blind spots the author's mental model shares with the reviewer.
- aeoAnswer Engine Optimization (AEO) skill — optimize content to be cited by AI language models (ChatGPT, Perplexity, Claude, Gemini, Mistral) as authoritative sources. Distinct from SEO — AEO optimizes for citation in LLM-generated responses, not search rankings. Use when planning content for AI-first search audiences, auditing existing content for E-E-A-T signals, tracking which pages get cited by which LLMs, or building a citation-friendly content strategy. Triggers — 'AEO audit', 'optimize for ChatGPT', 'get cited by Perplexity', 'LLM citation strategy', 'answer engine optimization', 'content for AI search', 'E-E-A-T audit'. Output is a markdown audit report (default) or JSON for pipeline integration. Stdlib-only Python tools.
- agent-designerUse when the user asks to design a multi-agent system, pick an orchestration pattern (supervisor/swarm/pipeline), generate tool schemas for agents, or evaluate agent execution logs for cost, latency, and failure bottlenecks. Examples: 'design an agent architecture for research automation', 'generate Anthropic tool schemas from these tool descriptions', 'analyze these agent run logs for bottlenecks'. NOT for Claude Code workflow files (use workflow-builder) or single-agent prompt design (use agent-workflow-designer).
- agent-protocolInter-agent communication protocol for C-suite agent teams. Defines invocation syntax, loop prevention, isolation rules, and response formats. Use when C-suite agents need to query each other, coordinate cross-functional analysis, or run board meetings with multiple agent roles.
- agent-workflow-designerDesign production-grade multi-agent workflows with clear pattern choice (sequential, parallel, hierarchical), handoff contracts, failure handling, and cost/context controls. Use when architecting a multi-step agent pipeline, choosing between single-agent vs multi-agent approaches, or refactoring an LLM workflow that suffers from context bloat or unreliable handoffs.
- agenthubMulti-agent collaboration plugin that spawns N parallel subagents competing on the same task via git worktree isolation. Agents work independently, results are evaluated by metric or LLM judge, and the best branch is merged. Use when: user wants multiple approaches tried in parallel — code optimization, content variation, research exploration, or any task that benefits from parallel competition. Requires: a git repo.
- agile-product-ownerAgile product ownership for backlog management and sprint execution. Covers user story writing, acceptance criteria, sprint planning, and velocity tracking. Use when writing user stories, creating acceptance criteria, planning sprints, estimating story points, breaking down epics, or prioritizing the backlog.