benchmark-inventory
$
npx mdskill add yogsoth-ai/de-anthropocentric-research-engine/benchmark-inventoryCatalogs all relevant benchmarks in a specified research domain and capability focus
- Solves the problem of finding and organizing benchmarks for research analysis
- Uses Papers With Code, Semantic Scholar, and web search to gather benchmark data
- Classifies benchmarks by capability, modality, difficulty, and maintenance status
- Returns a structured catalog with metadata for downstream evaluation and selection
SKILL.md
.github/skills/benchmark-inventoryView on GitHub ↗
--- name: benchmark-inventory description: Identify and catalog all relevant benchmarks in target domain execution: subagent prompt: ./prompt.md input: research_domain, capability_focus used-by: benchmark-archaeology --- # Benchmark Inventory SOP Identify, catalog, and characterize all relevant benchmarks for a given research domain and capability focus area. ## Input - **research_domain**: The broad research area (e.g., "natural language understanding", "code generation", "multimodal reasoning") - **capability_focus**: Specific capability of interest (e.g., "commonsense reasoning", "mathematical problem solving") ## Procedure 1. Search Papers With Code for benchmarks tagged with the domain 2. Search Semantic Scholar for benchmark papers in the domain 3. Search web for leaderboards and evaluation suites 4. For each benchmark found, collect: name, year, paper, size, primary metric, current SOTA, status 5. Classify by: capability tested, modality, difficulty level, maintenance status ## Output Structured catalog of benchmarks with metadata sufficient for downstream analysis selection.
More from yogsoth-ai/de-anthropocentric-research-engine
- abductive-hypothesis-generationStrategy: 面对异常的最佳解释推理
- ablation-brainstormRemove components one by one, observe system changes to reveal hidden dependencies and generate ideas from structural gaps.
- ablation-component-mappingMap system architecture to ablatable units for ablation studies
- ablation-designDesign ablation studies to isolate component contributions in ML systems
- ablation-executionRemove components one by one from a system, record the response/impact of each removal.
- abp-vulnerability-classificationClassify assumptions on 2 axes — load-bearing (how much conclusion depends on it) × vulnerable (how likely to be false). Focuses attention on High-Load × High-Vulnerable quadrant.
- abstraction-extractionExtract abstract principles from concrete domain cases. Strips domain-specific details to reveal transferable mechanisms.
- abstraction-ladderPerform bisociation at multiple abstraction levels
- abstraction-ladderingMove between concrete and abstract framings — 3 levels up (Why?) and 3 levels down (How?) to find the most productive research level.
- abstraction-to-designAbstract biological principle to design principle. Bridge from biology to engineering.