benchmark-inventory

$npx mdskill add yogsoth-ai/de-anthropocentric-research-engine/benchmark-inventory

Catalogs all relevant benchmarks in a specified research domain and capability focus

  • Solves the problem of finding and organizing benchmarks for research analysis
  • Uses Papers With Code, Semantic Scholar, and web search to gather benchmark data
  • Classifies benchmarks by capability, modality, difficulty, and maintenance status
  • Returns a structured catalog with metadata for downstream evaluation and selection
SKILL.md
.github/skills/benchmark-inventoryView on GitHub ↗
---
name: benchmark-inventory
description: Identify and catalog all relevant benchmarks in target domain
execution: subagent
prompt: ./prompt.md
input: research_domain, capability_focus
used-by: benchmark-archaeology
---

# Benchmark Inventory SOP

Identify, catalog, and characterize all relevant benchmarks for a given research domain and capability focus area.

## Input

- **research_domain**: The broad research area (e.g., "natural language understanding", "code generation", "multimodal reasoning")
- **capability_focus**: Specific capability of interest (e.g., "commonsense reasoning", "mathematical problem solving")

## Procedure

1. Search Papers With Code for benchmarks tagged with the domain
2. Search Semantic Scholar for benchmark papers in the domain
3. Search web for leaderboards and evaluation suites
4. For each benchmark found, collect: name, year, paper, size, primary metric, current SOTA, status
5. Classify by: capability tested, modality, difficulty level, maintenance status

## Output

Structured catalog of benchmarks with metadata sufficient for downstream analysis selection.
More from yogsoth-ai/de-anthropocentric-research-engine