decompose-split-by-category
$
npx mdskill add Memento-Teams/Memento-Widesearch/decompose-split-by-categoryDecomposes queries into subtasks by fixed categories for comprehensive entity lists.
- Helps with generating multi-row tables grouped by high-level taxonomies like organizational tiers.
- Integrates with worker assignment rules for load balancing and processing limits.
- Decides based on identifying primary partition keys and standardizing attribute extraction.
- Presents results as a merged, consistent table with uniform entity attributes.
SKILL.md
.github/skills/decompose-split-by-categoryView on GitHub ↗
--- name: decompose-split-by-category description: Specialized decomposition strategy for split-by-category tasks. --- ## When to Use This strategy is ideal for queries requiring a comprehensive census or list of entities that are naturally organized by a fixed, high-level taxonomy. Use this when the request specifies a set of discrete "buckets" (e.g., organizational tiers, geographic zones, or professional classifications) and requires multiple attributes for every entity within those buckets. The data shape is typically a multi-row table where rows are grouped by a primary categorical header. ## Decomposition Template 1. **Identify the Primary Partition Key:** Determine the highest-level category mentioned in the prompt (e.g., "Classification A", "Region B"). 2. **Define the Scope per Subtask:** Assign one or more full categories to each worker. Do not split a single category across multiple workers unless the entity count per category exceeds the worker's processing limit. 3. **Standardize Attribute Extraction:** Define a uniform set of "Entity Attributes" (e.g., "Identifier X", "Metric Y", "Status Z") that every worker must retrieve to ensure the final merged table is consistent. 4. **Temporal/Constraint Alignment:** If the query specifies a time range (e.g., "Year X to Year Y") or a threshold (e.g., "Value > N"), include these constraints in every subtask definition to prevent workers from returning out-of-scope data. ## Worker Assignment Rules * **Load Balancing:** Aim for 10–20 entities per worker. If a single category (e.g., "Category A") contains 50+ entities, further decompose that subtask by a secondary attribute (e.g., "Category A - Sub-group 1"). **Always prefer more workers with narrower scope** — each worker has a limited tool call budget, so smaller scope = higher completeness. * **Isolation:** Each worker should be responsible for the complete lifecycle of their assigned category—from discovery of the entity list to the retrieval of specific metadata. * **Redundancy/Verification:** For high-precision tasks, assign a "Verification Worker" to cross-check 10% of the entries from each category against a second source. ## Required Columns Checklist * **Categorical Headers:** The primary and secondary labels used to group the data. * **Unique Identifiers:** The specific name or ID of the entity being described. * **Quantitative Metrics:** Numerical values, scores, or rankings associated with the entity. * **Temporal Metadata:** Years, seasons, or timestamps indicating when the data was recorded. * **Source/Reference Links:** Direct URLs or citations for the specific data points retrieved. * **Status Indicators:** Current state or classification (e.g., "Active", "Completed", "Pending"). ## Anti-Patterns * **The "Truncation Trap":** Assigning too many categories to a single worker, leading to incomplete tables or "cut-off" responses due to output token limits. * **Inconsistent Schema:** Failing to enforce a strict column list, resulting in workers returning different types of data for different categories (e.g., Worker A provides "Metric X" while Worker B provides "Metric Y"). * **Missing "Null" Values:** Workers omitting entities because certain attributes are missing. Workers should be instructed to use "N/A" or "Unknown" to maintain the integrity of the list. * **Overlapping Scopes:** Assigning the same category to two workers without clear boundaries, leading to duplicate rows in the final output.
More from Memento-Teams/Memento-Widesearch
- decompose-annual-rank-statsSpecialized decomposition strategy for annual-rank-stats tasks.
- decompose-comparative-data-extractionOrchestrates parallel retrieval of distinct temporal or spatial data points followed by a centralized calculation and verification step.
- decompose-constrained-set-searchOrchestrates a multi-stage filter-and-verify process to find a specific entity satisfying three or more independent constraints.
- decompose-entity-benchmarkingSpecialized decomposition strategy for entity-benchmarking tasks involving multi-attribute data collection across a defined set of entities.
- decompose-geographic-registriesSpecialized decomposition strategy for geographic-registries tasks.
- decompose-linear-multi-hop-dependencyDecomposes queries where each answer serves as the mandatory search key for the subsequent step in a sequential chain.
- decompose-multimedia-source-verificationA two-stage strategy to isolate specific visual or bibliographic details from social media and video content by separating source identification from contextual extraction.
- decompose-split-by-entitySpecialized decomposition strategy for split-by-entity tasks requiring deep attribute extraction for a discrete list of subjects.
- decompose-split-by-rank-segmentSpecialized decomposition strategy for split-by-rank-segment tasks.
- decompose-split-by-time-periodSpecialized decomposition strategy for tasks requiring exhaustive data collection over a continuous chronological range.