workload-balancing
$
npx mdskill add elizaOS/eliza/workload-balancingDistribute work efficiently across parallel workers to maximize throughput and minimize completion time.
SKILL.md
.github/skills/workload-balancingView on GitHub ↗
---
name: workload-balancing
description: Optimize workload distribution across workers, processes, or nodes for efficient parallel execution. Use when asked to balance work distribution, improve parallel efficiency, reduce stragglers, implement load balancing, or optimize task scheduling. Covers static/dynamic partitioning, work stealing, and adaptive load balancing strategies.
---
# Workload Balancing Skill
Distribute work efficiently across parallel workers to maximize throughput and minimize completion time.
## Workflow
1. **Characterize** the workload (uniform vs. variable task times)
2. **Identify** bottlenecks (stragglers, uneven distribution)
3. **Select** balancing strategy based on workload characteristics
4. **Implement** partitioning and scheduling logic
5. **Monitor** and adapt to runtime conditions
## Load Balancing Decision Tree
```
What's the workload characteristic?
Uniform task times:
├── Known count → Static partitioning (equal chunks)
├── Streaming input → Round-robin distribution
└── Large items → Size-aware partitioning
Variable task times:
├── Predictable variance → Weighted distribution
├── Unpredictable → Dynamic scheduling / work stealing
└── Long-tail distribution → Work stealing + time limits
Resource constraints:
├── Memory-bound workers → Memory-aware assignment
├── Heterogeneous workers → Capability-based routing
└── Network costs → Locality-aware placement
```
## Balancing Strategies
### Strategy 1: Static Chunking (Uniform Workloads)
Best for: predictable, similar-sized tasks
```python
from concurrent.futures import ProcessPoolExecutor
import numpy as np
def static_balanced_process(items, num_workers=4):
"""Divide work into equal chunks upfront."""
chunks = np.array_split(items, num_workers)
with ProcessPoolExecutor(max_workers=num_workers) as executor:
results = list(executor.map(process_chunk, chunks))
return [item for chunk_result in results for item in chunk_result]
```
### Strategy 2: Dynamic Task Queue (Variable Workloads)
Best for: unpredictable task durations
```python
from concurrent.futures import ProcessPoolExecutor, as_completed
from queue import Queue
def dynamic_balanced_process(items, num_workers=4):
"""Workers pull tasks dynamically as they complete."""
results = []
with ProcessPoolExecutor(max_workers=num_workers) as executor:
# Submit one task per worker initially
futures = {executor.submit(process_item, item): item
for item in items[:num_workers]}
pending = list(items[num_workers:])
while futures:
done, _ = wait(futures, return_when=FIRST_COMPLETED)
for future in done:
results.append(future.result())
del futures[future]
# Submit next task if available
if pending:
next_item = pending.pop(0)
futures[executor.submit(process_item, next_item)] = next_item
return results
```
### Strategy 3: Work Stealing (Long-Tail Tasks)
Best for: when some tasks take much longer than others
```python
import asyncio
from collections import deque
class WorkStealingPool:
def __init__(self, num_workers):
self.queues = [deque() for _ in range(num_workers)]
self.num_workers = num_workers
def distribute(self, items):
"""Initial round-robin distribution."""
for i, item in enumerate(items):
self.queues[i % self.num_workers].append(item)
async def worker(self, worker_id, process_fn):
"""Process own queue, steal from others when empty."""
while True:
# Try own queue first
if self.queues[worker_id]:
item = self.queues[worker_id].popleft()
else:
# Steal from busiest queue
item = self._steal_work(worker_id)
if item is None:
break
await process_fn(item)
def _steal_work(self, worker_id):
"""Steal from the queue with most items."""
busiest = max(range(self.num_workers),
key=lambda i: len(self.queues[i]) if i != worker_id else 0)
if self.queues[busiest]:
return self.queues[busiest].pop() # Steal from end
return None
```
### Strategy 4: Weighted Distribution
Best for: when task costs are known or estimable
```python
def weighted_partition(items, weights, num_workers):
"""Partition items to balance total weight per worker."""
# Sort by weight descending (largest first fit)
sorted_items = sorted(zip(items, weights), key=lambda x: -x[1])
worker_loads = [0] * num_workers
worker_items = [[] for _ in range(num_workers)]
for item, weight in sorted_items:
# Assign to least loaded worker
min_worker = min(range(num_workers), key=lambda i: worker_loads[i])
worker_items[min_worker].append(item)
worker_loads[min_worker] += weight
return worker_items
```
### Strategy 5: Async Semaphore Balancing (I/O Workloads)
Best for: limiting concurrent I/O operations
```python
import asyncio
async def semaphore_balanced_fetch(urls, max_concurrent=10):
"""Limit concurrent operations while processing queue."""
semaphore = asyncio.Semaphore(max_concurrent)
async def bounded_fetch(url):
async with semaphore:
return await fetch(url)
return await asyncio.gather(*[bounded_fetch(url) for url in urls])
```
## Partitioning Strategies
| Strategy | Best For | Implementation |
|----------|----------|----------------|
| Equal chunks | Uniform tasks | `np.array_split(items, n)` |
| Round-robin | Streaming | `items[i::n_workers]` |
| Size-weighted | Known sizes | Bin packing algorithm |
| Hash-based | Consistent routing | `hash(key) % n_workers` |
| Range-based | Sorted/ordered data | Contiguous ranges |
## Handling Stragglers
Techniques to mitigate slow workers:
```python
# 1. Timeout with fallback
from concurrent.futures import TimeoutError
try:
result = future.result(timeout=30)
except TimeoutError:
result = fallback_value
# 2. Speculative execution (backup tasks)
async def speculative_execute(task, timeout=10):
primary = asyncio.create_task(execute(task))
try:
return await asyncio.wait_for(primary, timeout)
except asyncio.TimeoutError:
backup = asyncio.create_task(execute(task)) # Retry
done, pending = await asyncio.wait(
[primary, backup], return_when=asyncio.FIRST_COMPLETED
)
for p in pending:
p.cancel()
return done.pop().result()
# 3. Dynamic rebalancing
def rebalance_on_straggler(futures, threshold_ratio=2.0):
"""Redistribute work if one worker falls behind."""
avg_completion = statistics.mean(completion_times)
for future, worker_id in futures.items():
if future.running() and elapsed(future) > threshold_ratio * avg_completion:
# Cancel and redistribute
remaining_work = cancel_and_get_remaining(future)
redistribute(remaining_work, fast_workers)
```
## Monitoring Metrics
Track these for balanced execution:
| Metric | Calculation | Target |
|--------|-------------|--------|
| Load imbalance | `max(load) / avg(load)` | < 1.2 |
| Straggler ratio | `max(time) / median(time)` | < 2.0 |
| Worker utilization | `busy_time / total_time` | > 90% |
| Queue depth variance | `std(queue_lengths)` | Low |
## Anti-Patterns
| Problem | Cause | Fix |
|---------|-------|-----|
| Starvation | Large tasks block queue | Break into subtasks |
| Thundering herd | All workers wake at once | Jittered scheduling |
| Hot spots | Uneven key distribution | Better hash function |
| Convoy effect | Workers wait on same resource | Fine-grained locking |
| Over-partitioning | Too many small tasks | Batch small items |
## Verification Checklist
Before finalizing balanced code:
- [ ] Work distribution is roughly even (measure completion times)
- [ ] No starvation (all workers stay busy)
- [ ] Stragglers are handled (timeout/retry logic)
- [ ] Overhead is acceptable (partitioning cost vs. task cost)
- [ ] Results are complete and correct
- [ ] Resource utilization is high across workers
More from elizaOS/eliza
- ac-branch-pi-modelAC branch pi-model power flow equations (P/Q and |S|) with transformer tap ratio and phase shift, matching `acopf-math-model.md` and MATPOWER branch fields. Use when computing branch flows in either direction, aggregating bus injections for nodal balance, checking MVA (rateA) limits, computing branch loading %, or debugging sign/units issues in AC power flow.
- academic-pdf-redactionRedact text from PDF documents for blind review anonymization
- ada-plan-view-accessibilityUse when checking simplified ADA-derived plan-view bathroom accessibility constraints such as turning space, door clear width, toilet centerline, grab bars, and lavatory knee/toe clearance.
- analyze-ciAnalyze failed GitHub Action jobs for a pull request.
- architectural-dxf-extractionUse when extracting plan-view architectural geometry from DXF files with semantic CAD layers, especially when outputs must normalize rooms, doors, fixtures, clearances, and grab bars into machine-checkable JSON.
- attitude-controller-plannerUse this skill when implementing the inner control loop for a quadrotor — attitude (roll/pitch/yaw) PID control and attitude planning (converting desired acceleration to desired Euler angles). Covers gain layout, integral reset pattern, and the attitude planner inverse kinematics.
- azure-bgpAnalyze and resolve BGP oscillation and BGP route leaks in Azure Virtual WAN–style hub-and-spoke topologies (and similar cloud-managed BGP environments). Detect preference cycles, identify valley-free violations, and propose allowed policy-level mitigations while rejecting prohibited fixes.
- box-least-squaresBox Least Squares (BLS) periodogram for detecting transiting exoplanets and eclipsing binaries. Use when searching for periodic box-shaped dips in light curves. Alternative to Transit Least Squares, available in astropy.timeseries. Based on Kovács et al. (2002).
- browser-testingVERIFY your changes work. Measure CLS, detect theme flicker, test visual stability, check performance. Use BEFORE and AFTER making changes to confirm fixes. Includes ready-to-run scripts: measure-cls.ts, detect-flicker.ts
- cache-policy-comparisonCompare and implement eviction policies (LRU, LFU, FIFO, S3FIFO, ARC) for bounded-capacity caches. Use when choosing or implementing an eviction policy for a buffer pool, page cache, CDN edge, or LLM KV cache, or when writing a replay simulator that supports multiple policies. Clarifies recency vs frequency semantics, queue topology, saturating counters, ghost buffers, and the second-chance rule that distinguishes modern FIFO-family policies from classic LRU.