baseline-establishment
$
npx mdskill add yogsoth-ai/de-anthropocentric-research-engine/baseline-establishment| User Intent | Route To | |-------------|----------| | Find all methods for a task | method-inventory | | Extract scores from papers | performance-extraction | | Normalize conditions across papers | condition-standardization | | Check reproducibility / discrepancies | discrepancy-analysis | | Track progress over time / headroom | progress-quantification |
SKILL.md
.github/skills/baseline-establishmentView on GitHub ↗
--- name: baseline-establishment description: SOTA Performance Baseline Campaign — 5 strategies for systematically collecting, standardizing, and analyzing performance data across methods. Produces standardized comparison tables, progress curves, and headroom analysis. execution: campaign used-by: knowledge-acquisition --- # Baseline Establishment ## Strategy Routing | User Intent | Route To | |-------------|----------| | Find all methods for a task | method-inventory | | Extract scores from papers | performance-extraction | | Normalize conditions across papers | condition-standardization | | Check reproducibility / discrepancies | discrepancy-analysis | | Track progress over time / headroom | progress-quantification | ## Manifest ### Strategies (5) | Strategy | Purpose | |----------|---------| | method-inventory | Comprehensively identify all relevant methods for a task | | performance-extraction | Systematically extract performance data and conditions from papers | | condition-standardization | Standardize evaluation condition differences across papers | | discrepancy-analysis | Identify discrepancies between reported and reproducible scores | | progress-quantification | Track performance progress over time, quantify remaining headroom | ### Tactics (3) | Tactic | Purpose | |--------|---------| | leaderboard-harvesting | Systematically collect performance data from platforms and papers | | condition-normalization | Compare and standardize experimental conditions across papers | | progress-curve-construction | Build performance-over-time progress curves | ### Subagent SOPs (10) | SOP | Purpose | |-----|---------| | method-discovery | Identify methods via literature, leaderboards, citation chains | | score-extraction | Extract (Task, Dataset, Metric, Score, Conditions) tuples | | condition-cataloging | Record evaluation conditions per method | | reproducibility-checklist-audit | Assess paper against ML Reproducibility Checklist | | performance-table-assembly | Assemble unified comparison table | | compute-normalization | Normalize results by compute budget | | discrepancy-identification | Compare same-method scores across sources | | headroom-estimation | Estimate ceiling vs current SOTA gap | | progress-curve-fitting | Construct performance-over-time data | | baseline-synthesis | Produce final structured baseline report | ## Budget Table | Strategy | Methods | Data Points | Web Searches | |----------|---------|-------------|--------------| | method-inventory | 50 | 0 | 60 | | performance-extraction | 30 | 150 | 40 | | condition-standardization | 20 | 60 | 30 | | discrepancy-analysis | 15 | 45 | 30 | | progress-quantification | 30 | 100 | 40 | | **TOTAL** | **145** | **355** | **200** | ## MCP Tools | MCP Server | Tools | |------------|-------| | brave-search | brave_web_search, brave_llm_context | | apify | rag-web-browser, google-scholar-scraper | | alphaxiv | get_paper_content, answer_pdf_queries | | semantic-scholar | ss_paper, ss_relevance_search, ss_citations, ss_references | ## Context Management Campaign outputs are accumulated in the calling knowledge-acquisition context: - `methods_inventory.json` — All discovered methods with metadata - `performance_data.json` — Extracted scores with provenance - `conditions_matrix.json` — Standardized conditions per method - `discrepancy_report.json` — Flagged score inconsistencies - `progress_curves.json` — Time-series performance data - `baseline_report.md` — Final synthesized baseline document
More from yogsoth-ai/de-anthropocentric-research-engine
- abductive-hypothesis-generationStrategy: 面对异常的最佳解释推理
- ablation-brainstormRemove components one by one, observe system changes to reveal hidden dependencies and generate ideas from structural gaps.
- ablation-component-mappingMap system architecture to ablatable units for ablation studies
- ablation-designDesign ablation studies to isolate component contributions in ML systems
- ablation-executionRemove components one by one from a system, record the response/impact of each removal.
- abp-vulnerability-classificationClassify assumptions on 2 axes — load-bearing (how much conclusion depends on it) × vulnerable (how likely to be false). Focuses attention on High-Load × High-Vulnerable quadrant.
- abstraction-extractionExtract abstract principles from concrete domain cases. Strips domain-specific details to reveal transferable mechanisms.
- abstraction-ladderPerform bisociation at multiple abstraction levels
- abstraction-ladderingMove between concrete and abstract framings — 3 levels up (Why?) and 3 levels down (How?) to find the most productive research level.
- abstraction-to-designAbstract biological principle to design principle. Bridge from biology to engineering.