decompose-entity-benchmarking
$
npx mdskill add Memento-Teams/Memento-Widesearch/decompose-entity-benchmarkingDecomposes entity-benchmarking tasks into structured steps for multi-attribute data collection across defined entities.
- Helps with structured comparisons of entities like products or organizations against standardized metrics.
- Integrates with or depends on primary sources for data extraction and synthesis.
- Decides based on scope definition, entity partitioning, and attribute standardization strategies.
- Presents results as unified tables from disparate sources for clear analysis.
SKILL.md
.github/skills/decompose-entity-benchmarkingView on GitHub ↗
--- name: decompose-entity-benchmarking description: Specialized decomposition strategy for entity-benchmarking tasks involving multi-attribute data collection across a defined set of entities. --- ## When to Use This strategy applies when a query requires a structured comparison of multiple distinct entities (e.g., products, organizations, or programs) against a set of standardized metrics. Key indicators include: - Requests for "all models/types" within a specific category or time range. - Requirements for deep attribute extraction (e.g., technical specs, pricing, or entry requirements) across a list of entities. - Comparison across multiple independent ranking systems or performance benchmarks. - Data that must be synthesized into a unified table from disparate primary sources. ## Decomposition Template 1. **Scope Definition:** Identify the boundary conditions (e.g., "Region X only," "Time Period Y," or "Category Z"). Define the "Universe of Entities" to prevent over-collection or missing niche entries. 2. **Entity Partitioning:** Divide the total entity list into logical groups. The principle is to group entities that likely share a common primary source (e.g., grouping by parent organization, manufacturer, or alliance) to minimize redundant navigation for workers. 3. **Attribute Standardization:** Define a "Master Schema" for all workers. This ensures that if Worker A finds "Metric X," Worker B is also looking for "Metric X" using the same units or definitions. 4. **Temporal/Version Alignment:** Explicitly instruct workers to verify the specific version or year requested (e.g., "Current 2025 specs" vs. "Historical 2015 specs") to avoid mixing data from different cycles. 5. **Synthesis & Deduplication:** A final step to merge partitioned tables, ensuring that entities appearing in multiple categories are not duplicated and that formatting is consistent. ## Worker Assignment Rules - **Partition Size:** Assign 3–5 major entities (or 10–15 minor line items) per worker to maintain high data precision and prevent search fatigue. **Always prefer more workers with smaller batches** — each worker has a limited tool call budget, so smaller scope = higher completeness. - **Domain Grouping:** Assign entities from the same "family" or "brand" to the same worker to leverage site-specific navigation patterns. - **Verification Workers:** If the data involves high-stakes metrics (e.g., financial fees or legal requirements), assign a secondary worker to "Spot Check" 20% of the rows found by the primary workers. ## Required Columns Checklist - **Primary Identifiers:** Unique names, official IDs, or model numbers. - **Categorical Metadata:** Parent groups, alliances, or sub-categories. - **Temporal Metadata:** Release dates, effective years, or application cycles. - **Primary Metrics:** The core performance or specification data requested. - **Access Metadata:** Official source URLs, direct links to documentation, or "last updated" timestamps. - **Constraint Flags:** Columns indicating if an entity meets specific filters (e.g., "Standard Range Only" or "US Market Only"). ## Anti-Patterns - **The "Generalist" Failure:** Assigning one worker to find "all entities" in a broad category. This leads to missing niche entities or truncated lists. - **Metric Drift:** Failing to define units (e.g., "Currency A" vs "Currency B"), leading to a table with incomparable data. - **Source Homogenization:** Relying on a single aggregator site that may be outdated. Workers should be instructed to seek official primary sources for each entity group. - **Ignoring "Program-Specific" Nuance:** Treating a large organization as a monolith when attributes (like deadlines or fees) actually vary by specific sub-entity or department.
More from Memento-Teams/Memento-Widesearch
- decompose-annual-rank-statsSpecialized decomposition strategy for annual-rank-stats tasks.
- decompose-comparative-data-extractionOrchestrates parallel retrieval of distinct temporal or spatial data points followed by a centralized calculation and verification step.
- decompose-constrained-set-searchOrchestrates a multi-stage filter-and-verify process to find a specific entity satisfying three or more independent constraints.
- decompose-geographic-registriesSpecialized decomposition strategy for geographic-registries tasks.
- decompose-linear-multi-hop-dependencyDecomposes queries where each answer serves as the mandatory search key for the subsequent step in a sequential chain.
- decompose-multimedia-source-verificationA two-stage strategy to isolate specific visual or bibliographic details from social media and video content by separating source identification from contextual extraction.
- decompose-split-by-categorySpecialized decomposition strategy for split-by-category tasks.
- decompose-split-by-entitySpecialized decomposition strategy for split-by-entity tasks requiring deep attribute extraction for a discrete list of subjects.
- decompose-split-by-rank-segmentSpecialized decomposition strategy for split-by-rank-segment tasks.
- decompose-split-by-time-periodSpecialized decomposition strategy for tasks requiring exhaustive data collection over a continuous chronological range.