decompose-split-by-entity

$npx mdskill add Memento-Teams/Memento-Widesearch/decompose-split-by-entity

Decomposes queries into parallel deep-dive research tasks for independent entities, ideal for generating comprehensive comparison tables.

  • Helps with extracting uniform high-density attributes for a discrete list of subjects, such as products or organizations.
  • Integrates with research tools for attribute extraction, though specific APIs are not detailed in the provided content.
  • Decides by identifying entities, standardizing attributes, partitioning batches, and assigning workers for targeted research.
  • Presents results as a unified tabular output, aggregating independent rows into a single comprehensive table.
SKILL.md
.github/skills/decompose-split-by-entityView on GitHub ↗
---
name: decompose-split-by-entity
description: Specialized decomposition strategy for split-by-entity tasks requiring deep attribute extraction for a discrete list of subjects.
---

## When to Use
Use this strategy when a query identifies a specific set of independent subjects (entities) and requests a uniform set of high-density attributes for each. This pattern is ideal when:
- The entities belong to a clear category (e.g., products, organizations, creative works).
- Each entity requires "deep-dive" research across multiple technical, financial, or historical dimensions.
- The data for one entity does not depend on or overlap with the data of another.
- The output is expected to be a comprehensive, multi-column comparison table.

## Decomposition Template
1.  **Entity Enumeration:** Identify the full list of primary subjects. If the query provides a range (e.g., "all models in series X"), the first subtask must be to generate an exhaustive list of these entities.
2.  **Attribute Definition:** Standardize the required data points (metrics, dates, specifications) to ensure consistency across all workers.
3.  **Horizontal Partitioning:** Divide the list of entities into small batches. Assign each batch to a separate worker.
4.  **Deep-Dive Extraction:** Each worker performs targeted research for their assigned entities only, focusing on filling every required attribute column.
5.  **Vertical Synthesis:** A final pass aggregates the independent rows into a single unified table, ensuring formatting (units, date formats) is synchronized.

## Worker Assignment Rules
- **Batch Size:** Assign 3–5 complex entities per worker. For simpler entities (e.g., single-attribute lists), this can increase to 10. **Always prefer more workers with smaller batches** — each worker has a limited tool call budget, so smaller scope = higher completeness.
- **Specialization:** If the entities span different sub-categories or eras, group them by similarity to allow the worker to maintain context.
- **Verification:** For high-precision tasks (e.g., financial data or technical specs), assign a "Cross-Check" worker to verify 20% of the data points against primary sources.

## Required Columns Checklist
- **Primary Identifiers:** Official names, unique IDs, or parent organizations.
- **Temporal Metadata:** Launch/release dates, sunset/discontinuation dates, or specific "as of" timestamps.
- **Quantitative Metrics:** Technical specifications, financial figures, or performance scores (always include units).
- **Categorical Classifiers:** Type, status, or classification tags that allow for sorting/filtering.
- **Relational Data:** Associated people (e.g., leadership, creators) or secondary entities (e.g., locations, subsidiaries).

## Anti-Patterns
- **The "Breadth-First" Failure:** Attempting to find one specific attribute for *all* entities at once. This leads to high tool-call volume and frequent timeouts. Always research all attributes for a *subset* of entities.
- **Scope Creep:** Including "limited edition" or "variant" data when the query specifies "standard" or "core" range.
- **Attribute Omission:** Failing to capture secondary details (like "credits" or "requirements") because the worker focused only on the primary name and date.
- **Unit Inconsistency:** Mixing different measurement systems (e.g., metric vs. imperial) or date formats across different workers.
- **Missing "Zero" Values:** Leaving cells blank instead of explicitly stating "None" or "N/A" when an attribute is confirmed to be non-existent for a specific entity.
More from Memento-Teams/Memento-Widesearch