distributed-systems
$
npx mdskill add yonatangross/orchestkit/distributed-systemsProvides distributed systems patterns for locking, resilience, idempotency, and rate limiting in backend architectures.
- Helps implement distributed locks, circuit breakers, retry policies, and fault tolerance patterns.
- Integrates with tools like Redis, PostgreSQL, and supports edge computing platforms.
- Recommends patterns based on categories like distributed locks, resilience, and idempotency.
- Delivers results through rule files loaded on-demand for specific system needs.
SKILL.md
.github/skills/distributed-systemsView on GitHub ↗
---
name: distributed-systems
license: MIT
compatibility: "Claude Code 2.1.76+."
description: Distributed systems patterns for locking, resilience, idempotency, and rate limiting. Use when implementing distributed locks, circuit breakers, retry policies, idempotency keys, token bucket rate limiters, or fault tolerance patterns.
tags: [distributed-systems, distributed-locks, resilience, circuit-breaker, idempotency, rate-limiting, retry, fault-tolerance, edge-computing, cloudflare-workers, vercel-edge, event-sourcing, cqrs, saga, outbox, message-queue, kafka]
context: fork
agent: backend-system-architect
version: 2.0.0
author: OrchestKit
user-invocable: false
disable-model-invocation: true
complexity: medium
persuasion-type: reference
metadata:
category: document-asset-creation
allowed-tools:
- Read
- Glob
- Grep
- WebFetch
- WebSearch
---
# Distributed Systems Patterns
Comprehensive patterns for building reliable distributed systems. Each category has individual rule files in `rules/` loaded on-demand.
## Quick Reference
| Category | Rules | Impact | When to Use |
|----------|-------|--------|-------------|
| [Distributed Locks](#distributed-locks) | 3 | CRITICAL | Redis/Redlock locks, PostgreSQL advisory locks, fencing tokens |
| [Resilience](#resilience) | 3 | CRITICAL | Circuit breakers, retry with backoff, bulkhead isolation |
| [Idempotency](#idempotency) | 3 | HIGH | Idempotency keys, request dedup, database-backed idempotency |
| [Rate Limiting](#rate-limiting) | 3 | HIGH | Token bucket, sliding window, distributed rate limits |
| [Edge Computing](#edge-computing) | 2 | HIGH | Edge workers, V8 isolates, CDN caching, geo-routing |
| [Event-Driven](#event-driven) | 2 | HIGH | Event sourcing, CQRS, transactional outbox, sagas |
**Total: 16 rules across 6 categories**
## Quick Start
```python
# Redis distributed lock with Lua scripts
async with RedisLock(redis_client, "payment:order-123"):
await process_payment(order_id)
# Circuit breaker for external APIs
@circuit_breaker(failure_threshold=5, recovery_timeout=30)
@retry(max_attempts=3, base_delay=1.0)
async def call_external_api():
...
# Idempotent API endpoint
@router.post("/payments")
async def create_payment(
data: PaymentCreate,
idempotency_key: str = Header(..., alias="Idempotency-Key"),
):
return await idempotent_execute(db, idempotency_key, "/payments", process)
# Token bucket rate limiting
limiter = TokenBucketLimiter(redis_client, capacity=100, refill_rate=10)
if await limiter.is_allowed(f"user:{user_id}"):
await handle_request()
```
## Distributed Locks
Coordinate exclusive access to resources across multiple service instances.
| Rule | File | Key Pattern |
|------|------|-------------|
| Redis & Redlock | `${CLAUDE_SKILL_DIR}/rules/locks-redis-redlock.md` | Lua scripts, SET NX, multi-node quorum |
| PostgreSQL Advisory | `${CLAUDE_SKILL_DIR}/rules/locks-postgres-advisory.md` | Session/transaction locks, lock ID strategies |
| Fencing Tokens | `${CLAUDE_SKILL_DIR}/rules/locks-fencing-tokens.md` | Owner validation, TTL, heartbeat extension |
## Resilience
Production-grade fault tolerance for distributed systems.
| Rule | File | Key Pattern |
|------|------|-------------|
| Circuit Breaker | `${CLAUDE_SKILL_DIR}/rules/resilience-circuit-breaker.md` | CLOSED/OPEN/HALF_OPEN states, sliding window |
| Retry & Backoff | `${CLAUDE_SKILL_DIR}/rules/resilience-retry-backoff.md` | Exponential backoff, jitter, error classification |
| Bulkhead Isolation | `${CLAUDE_SKILL_DIR}/rules/resilience-bulkhead.md` | Semaphore tiers, rejection policies, queue depth |
## Idempotency
Ensure operations can be safely retried without unintended side effects.
| Rule | File | Key Pattern |
|------|------|-------------|
| Idempotency Keys | `${CLAUDE_SKILL_DIR}/rules/idempotency-keys.md` | Deterministic hashing, Stripe-style headers |
| Request Dedup | `${CLAUDE_SKILL_DIR}/rules/idempotency-dedup.md` | Event consumer dedup, Redis + DB dual layer |
| Database-Backed | `${CLAUDE_SKILL_DIR}/rules/idempotency-database.md` | Unique constraints, upsert, TTL cleanup |
## Rate Limiting
Protect APIs with distributed rate limiting using Redis.
| Rule | File | Key Pattern |
|------|------|-------------|
| Token Bucket | `${CLAUDE_SKILL_DIR}/rules/ratelimit-token-bucket.md` | Redis Lua scripts, burst capacity, refill rate |
| Sliding Window | `${CLAUDE_SKILL_DIR}/rules/ratelimit-sliding-window.md` | Sorted sets, precise counting, no boundary spikes |
| Distributed Limits | `${CLAUDE_SKILL_DIR}/rules/ratelimit-distributed.md` | SlowAPI + Redis, tiered limits, response headers |
## Edge Computing
Edge runtime patterns for Cloudflare Workers, Vercel Edge, and Deno Deploy.
| Rule | File | Key Pattern |
|------|------|-------------|
| Edge Workers | `${CLAUDE_SKILL_DIR}/rules/edge-workers.md` | V8 isolate constraints, Web APIs, geo-routing, auth at edge |
| Edge Caching | `${CLAUDE_SKILL_DIR}/rules/edge-caching.md` | Cache-aside at edge, CDN headers, KV storage, stale-while-revalidate |
## Event-Driven
Event sourcing, CQRS, saga orchestration, and reliable messaging patterns.
| Rule | File | Key Pattern |
|------|------|-------------|
| Event Sourcing | `${CLAUDE_SKILL_DIR}/rules/event-sourcing.md` | Event-sourced aggregates, CQRS read models, optimistic concurrency |
| Event Messaging | `${CLAUDE_SKILL_DIR}/rules/event-messaging.md` | Transactional outbox, saga compensation, idempotent consumers |
## Key Decisions
| Decision | Recommendation |
|----------|----------------|
| Lock backend | Redis for speed, PostgreSQL if already using it, Redlock for HA |
| Lock TTL | 2-3x expected operation time |
| Circuit breaker recovery | Half-open probe with sliding window |
| Retry algorithm | Exponential backoff + full jitter |
| Bulkhead isolation | Semaphore-based tiers (Critical/Standard/Optional) |
| Idempotency storage | Redis (speed) + DB (durability), 24-72h TTL |
| Rate limit algorithm | Token bucket for most APIs, sliding window for strict quotas |
| Rate limit storage | Redis (distributed, atomic Lua scripts) |
## When NOT to Use
No separate event-sourcing/saga/CQRS skills exist — they are rules within distributed-systems. But most projects never need them.
| Pattern | Interview | Hackathon | MVP | Growth | Enterprise | Simpler Alternative |
|---------|-----------|-----------|-----|--------|------------|---------------------|
| Event sourcing | OVERKILL | OVERKILL | OVERKILL | OVERKILL | WHEN JUSTIFIED | Append-only table with status column |
| Saga orchestration | OVERKILL | OVERKILL | OVERKILL | SELECTIVE | APPROPRIATE | Sequential service calls with manual rollback |
| Circuit breaker | OVERKILL | OVERKILL | BORDERLINE | APPROPRIATE | REQUIRED | Try/except with timeout |
| Distributed locks | OVERKILL | OVERKILL | BORDERLINE | APPROPRIATE | REQUIRED | Database row-level lock (SELECT FOR UPDATE) |
| CQRS | OVERKILL | OVERKILL | OVERKILL | OVERKILL | WHEN JUSTIFIED | Single model for read/write |
| Transactional outbox | OVERKILL | OVERKILL | OVERKILL | SELECTIVE | APPROPRIATE | Direct publish after commit |
| Rate limiting | OVERKILL | OVERKILL | SIMPLE ONLY | APPROPRIATE | REQUIRED | Nginx rate limit or cloud WAF |
**Rule of thumb:** If you have a single server process, you do not need distributed systems patterns. Use in-process alternatives. Add distribution only when you actually have multiple instances.
## Anti-Patterns (FORBIDDEN)
```python
# LOCKS: Never forget TTL (causes deadlocks)
await redis.set(f"lock:{name}", "1") # WRONG - no expiry!
# LOCKS: Never release without owner check
await redis.delete(f"lock:{name}") # WRONG - might release others' lock
# RESILIENCE: Never retry non-retryable errors
@retry(max_attempts=5, retryable_exceptions={Exception}) # Retries 401!
# RESILIENCE: Never put retry outside circuit breaker
@retry # Would retry when circuit is open!
@circuit_breaker
async def call(): ...
# IDEMPOTENCY: Never use non-deterministic keys
key = str(uuid.uuid4()) # Different every time!
# IDEMPOTENCY: Never cache error responses
if response.status_code >= 400:
await cache_response(key, response) # Errors should retry!
# RATE LIMITING: Never use in-memory counters in distributed systems
request_counts = {} # Lost on restart, not shared across instances
```
## Detailed Documentation
| Resource | Description |
|----------|-------------|
| `${CLAUDE_SKILL_DIR}/scripts/` | Templates: lock implementations, circuit breaker, rate limiter |
| `${CLAUDE_SKILL_DIR}/checklists/` | Pre-flight checklists for each pattern category |
| `${CLAUDE_SKILL_DIR}/references/` | Deep dives: Redlock algorithm, bulkhead tiers, token bucket |
| `${CLAUDE_SKILL_DIR}/examples/` | Complete integration examples |
## Related Skills
- `caching` - Redis caching patterns, cache as fallback
- `background-jobs` - Job deduplication, async processing with retry
- `observability-monitoring` - Metrics and alerting for circuit breaker state changes
- `error-handling-rfc9457` - Structured error responses for resilience failures
- `auth-patterns` - API key management, authentication integration
More from yonatangross/orchestkit
- agent-orchestrationAgent orchestration patterns for agentic loops, multi-agent coordination, alternative frameworks, and multi-scenario workflows. Use when building autonomous agent loops, coordinating multiple agents, evaluating CrewAI/AutoGen/Swarm, or orchestrating complex multi-step scenarios.
- ai-ui-generationAI-assisted UI generation patterns for json-render, v0, Bolt, and Cursor workflows. Covers prompt engineering for component generation, review checklists for AI-generated code, design token injection, refactoring for design system conformance, and CI gates for quality assurance. Use when generating UI components with AI tools, rendering multi-surface MCP visual output, reviewing AI-generated code, or integrating AI output into design systems.
- analyticsQuery cross-project usage analytics. Use when reviewing agent, skill, hook, or team performance across OrchestKit projects. Also replay sessions, estimate costs, and view model delegation trends.
- animation-motion-designAnimation and motion design patterns using Motion library (formerly Framer Motion) and View Transitions API. Use when implementing component animations, page transitions, micro-interactions, gesture-driven UIs, or ensuring motion accessibility with prefers-reduced-motion.
- architecture-patternsArchitecture validation and patterns for clean architecture, backend structure enforcement, project structure validation, test standards, and context-aware sizing. Use when designing system boundaries, enforcing layered architecture, validating project structure, defining test standards, or choosing the right architecture tier for project scope.
- ascii-visualizerASCII diagram patterns for architecture, workflows, file trees, and data visualizations. Use when creating terminal-rendered diagrams, box-drawing layouts, progress bars, swimlanes, or blast radius visualizations.
- assessAssesses and rates quality 0-10 with pros/cons analysis. Use when evaluating code, designs, or approaches.
- async-jobsAsync job processing patterns for background tasks, Celery workflows, task scheduling, retry strategies, and distributed task execution. Use when implementing background job processing, task queues, or scheduled task systems.
- audit-fullFull-codebase audit using 1M context window. Security, architecture, and dependency analysis in a single pass. Use when you need whole-project analysis.
- audit-skillsAudits all OrchestKit skills for quality, completeness, and compliance with authoring standards. Use when checking skill health, before releases, or after bulk skill edits to surface SKILL.md files that are too long, have missing frontmatter, lack rules/references, or are unregistered in manifests.