decompose-entity-benchmarking

Name: decompose-entity-benchmarking
Author: Memento-Teams/Memento-Widesearch

$npx mdskill add Memento-Teams/Memento-Widesearch/decompose-entity-benchmarking

Decomposes entity-benchmarking tasks into structured steps for multi-attribute data collection across defined entities.

Helps with structured comparisons of entities like products or organizations against standardized metrics.
Integrates with or depends on primary sources for data extraction and synthesis.
Decides based on scope definition, entity partitioning, and attribute standardization strategies.
Presents results as unified tables from disparate sources for clear analysis.

SKILL.md

.github/skills/decompose-entity-benchmarkingView on GitHub ↗

---
name: decompose-entity-benchmarking
description: Specialized decomposition strategy for entity-benchmarking tasks involving multi-attribute data collection across a defined set of entities.
---

## When to Use
This strategy applies when a query requires a structured comparison of multiple distinct entities (e.g., products, organizations, or programs) against a set of standardized metrics. Key indicators include:
- Requests for "all models/types" within a specific category or time range.
- Requirements for deep attribute extraction (e.g., technical specs, pricing, or entry requirements) across a list of entities.
- Comparison across multiple independent ranking systems or performance benchmarks.
- Data that must be synthesized into a unified table from disparate primary sources.

## Decomposition Template
1. **Scope Definition:** Identify the boundary conditions (e.g., "Region X only," "Time Period Y," or "Category Z"). Define the "Universe of Entities" to prevent over-collection or missing niche entries.
2. **Entity Partitioning:** Divide the total entity list into logical groups. The principle is to group entities that likely share a common primary source (e.g., grouping by parent organization, manufacturer, or alliance) to minimize redundant navigation for workers.
3. **Attribute Standardization:** Define a "Master Schema" for all workers. This ensures that if Worker A finds "Metric X," Worker B is also looking for "Metric X" using the same units or definitions.
4. **Temporal/Version Alignment:** Explicitly instruct workers to verify the specific version or year requested (e.g., "Current 2025 specs" vs. "Historical 2015 specs") to avoid mixing data from different cycles.
5. **Synthesis & Deduplication:** A final step to merge partitioned tables, ensuring that entities appearing in multiple categories are not duplicated and that formatting is consistent.

## Worker Assignment Rules
- **Partition Size:** Assign 3–5 major entities (or 10–15 minor line items) per worker to maintain high data precision and prevent search fatigue. **Always prefer more workers with smaller batches** — each worker has a limited tool call budget, so smaller scope = higher completeness.
- **Domain Grouping:** Assign entities from the same "family" or "brand" to the same worker to leverage site-specific navigation patterns.
- **Verification Workers:** If the data involves high-stakes metrics (e.g., financial fees or legal requirements), assign a secondary worker to "Spot Check" 20% of the rows found by the primary workers.

## Required Columns Checklist
- **Primary Identifiers:** Unique names, official IDs, or model numbers.
- **Categorical Metadata:** Parent groups, alliances, or sub-categories.
- **Temporal Metadata:** Release dates, effective years, or application cycles.
- **Primary Metrics:** The core performance or specification data requested.
- **Access Metadata:** Official source URLs, direct links to documentation, or "last updated" timestamps.
- **Constraint Flags:** Columns indicating if an entity meets specific filters (e.g., "Standard Range Only" or "US Market Only").

## Anti-Patterns
- **The "Generalist" Failure:** Assigning one worker to find "all entities" in a broad category. This leads to missing niche entities or truncated lists.
- **Metric Drift:** Failing to define units (e.g., "Currency A" vs "Currency B"), leading to a table with incomparable data.
- **Source Homogenization:** Relying on a single aggregator site that may be outdated. Workers should be instructed to seek official primary sources for each entity group.
- **Ignoring "Program-Specific" Nuance:** Treating a large organization as a monolith when attributes (like deadlines or fees) actually vary by specific sub-entity or department.

More from Memento-Teams/Memento-Widesearch