system-design-interview
$
npx mdskill add mohitagw15856/pm-claude-skills/system-design-interviewStructures a complete, interview-grade system design response — covering clarifying questions, requirements, capacity estimates, architecture, component design, and trade-offs. Works equally well for real architecture sessions.
SKILL.md
.github/skills/system-design-interviewView on GitHub ↗
--- name: system-design-interview description: "Structure a complete system design answer for interview questions or real architecture sessions. Use when asked to design a system, answer a system design interview question, or architect a solution at scale. Produces a structured answer covering requirements, capacity estimates, high-level design, component deep-dives, trade-offs, and follow-up considerations." --- # System Design Interview Skill Structures a complete, interview-grade system design response — covering clarifying questions, requirements, capacity estimates, architecture, component design, and trade-offs. Works equally well for real architecture sessions. ## Required Inputs Ask for these if not provided: - **The system to design** (e.g. "design a URL shortener", "design a notification service", "design Twitter's feed") - **Scope** (interview prep / real architecture decision / practice run) - **Scale target** (rough numbers: DAU, requests/sec, data volume — or "assume typical web scale") - **Constraints or priorities** (e.g. prioritise availability over consistency, minimise cost, low-latency reads) - **Time available** (interview context only: 30 / 45 / 60 minutes — skip for real architecture sessions) - **Emphasis** (optional — any area to go deeper on, e.g. "focus on the DB design" or "spend more time on scaling") ## Output Format ### 1. Clarifying Questions Before designing, list 4–6 questions that would change the design. Examples: - Read-heavy or write-heavy? (affects caching and DB choice) - Global or single-region? (affects latency requirements) - Strong or eventual consistency? (affects storage and replication) - Acceptable latency targets? (p50 / p99) - Any existing infrastructure constraints? Then proceed with stated assumptions if answering an interview question. ### 2. Functional Requirements **Core features (must have):** - [Feature 1] - [Feature 2] - [Feature 3] **Out of scope (for this design):** - [What's deliberately excluded and why] ### 3. Non-Functional Requirements | Requirement | Target | |---|---| | Availability | [e.g. 99.9% / 99.99%] | | Latency | [e.g. p95 < 100ms for reads] | | Throughput | [e.g. 10k writes/sec peak] | | Consistency | [Strong / Eventual] | | Durability | [e.g. 99.999% — no data loss] | ### 4. Capacity Estimation **Traffic:** - DAU: [X] - Reads/sec: [X] (peak: [X]) - Writes/sec: [X] (peak: [X]) **Storage:** - Per record size: [X bytes] - Records per day: [X] - 5-year storage: [X GB/TB] **Bandwidth:** - Inbound: [X MB/s] - Outbound: [X MB/s] ### 5. High-Level Architecture Draw an ASCII diagram specific to this system. Do not default to the client→CDN→LB→API→Cache→DB template unless it genuinely applies. Label each component with the specific technology chosen (e.g. "Kafka" not "Message Queue", "PostgreSQL" not "DB"). Describe each component in 1–2 sentences explaining its role and why that technology was chosen. ### 6. Component Deep-Dive Pick the 2–3 most critical/interesting components and go deep: **[Component 1: e.g. Database Layer]** - Choice: [Technology and why — e.g. PostgreSQL for ACID guarantees, Cassandra for write throughput] - Schema design (high-level): [Key tables/collections and their structure] - Indexing strategy: [What gets indexed and why] - Replication: [Primary-replica / Multi-primary — and why] **[Component 2: e.g. Caching Strategy]** - Cache type: [Redis / Memcached — and why] - What gets cached: [Hot data — e.g. user sessions, frequent reads] - Cache invalidation: [TTL / Write-through / Write-behind — trade-offs] - Cache hit rate target: [e.g. 95%] **[Component 3: e.g. API Design]** - Key endpoints: [List the 3–5 most important API calls] - Authentication: [JWT / OAuth / API keys] - Rate limiting: [Where and at what rate] ### 7. Data Flow Walk through the two most critical paths end-to-end: **Write path:** [Step 1 → Step 2 → Step 3...] **Read path:** [Step 1 → Step 2 → Step 3...] ### 8. Scaling Bottlenecks and Mitigations | Bottleneck | Mitigation | |---|---| | [e.g. DB write throughput] | [e.g. sharding by user_id, write batching] | | [e.g. Hot-key cache misses] | [e.g. local in-process cache, probabilistic early expiry] | | [e.g. Single region latency] | [e.g. multi-region deployment, GeoDNS routing] | ### 9. Trade-offs and Alternatives Be explicit about what was chosen and what was sacrificed: | Decision | Why | Trade-off | |---|---|---| | [e.g. Eventual consistency] | [Higher availability, lower latency] | [Stale reads possible] | | [e.g. SQL over NoSQL] | [Complex queries, ACID transactions] | [Harder to shard horizontally] | | [e.g. Async processing via queue] | [Decoupled, more resilient] | [Eventual delivery, harder to debug] | ### 10. Follow-up Considerations Things to tackle in production but out of scope for this design session: - Monitoring and alerting (what metrics matter) - Disaster recovery and backup strategy - Security (auth, encryption at rest/transit, rate limiting) - Cost optimisation at scale - Gradual rollout and feature flagging ## Quality Checks - [ ] Clarifying questions are design-changing (not generic filler) - [ ] Capacity estimates show the arithmetic: DAU → requests/day → requests/sec → storage per record → total storage, so the numbers can be sanity-checked - [ ] Every row in the Trade-offs table has a non-empty Trade-off column (no rows where the trade-off is blank or says "none") - [ ] At least 2 component deep-dives with technology choices justified - [ ] Trade-offs section is honest (not just benefits of chosen approach) - [ ] Data flow is described end-to-end for the critical path ## Usage Examples - "Help me answer a system design interview: [question]" - "Design [system] for a system design interview" - "How would I architect [system] at scale?" - "I have a system design interview — the question is [X]" - "Design a [URL shortener / chat system / notification service / feed]"
More from mohitagw15856/pm-claude-skills
- 360-feedback-templateDesign a 360-degree feedback survey or write a structured 360 feedback report. Use when asked to build a 360 feedback process, write 360 feedback for a colleague, design a feedback survey, or produce a feedback report. Produces either a complete survey instrument with rating scales and open-ended questions, or a structured narrative feedback report with themes, strengths, and development areas.
- ab-test-plannerDesign statistically rigorous A/B tests for product features, UI changes, onboarding flows, and pricing experiments. Use when asked to set up an experiment, design an A/B test, calculate sample size, or interpret test results. Produces a complete test plan with hypothesis, variant definitions, sample size, duration estimate, guardrail metrics, and a results interpretation guide.
- accessibility-auditGenerate a WCAG 2.2 accessibility audit checklist and remediation suggestions for any UI or design. Use when asked to audit for accessibility, check WCAG compliance, review a design for a11y issues, or create an accessibility remediation plan. Produces a prioritised checklist with pass/fail assessments and specific fixes.
- account-planBuild a structured account plan for any key customer or target account. Use when asked to create an account plan, key account strategy, strategic account review, or territory plan. Produces a complete account plan with relationship map, growth opportunities, risks, and 90-day action plan.
- aeo-optimizerOptimize an article for Answer Engine Optimization (AEO) — restructuring content so AI engines like ChatGPT, Perplexity, and Claude can extract, quote, and cite it. Rewrites headings as questions, drops 50-80 word answer capsules, audits paragraph length, and flags trust signals. Use when asked to AEO-optimize, make content AI-readable, improve AI citation chances, or adapt an article for answer engines.
- ai-ethics-reviewConduct an ethical review of an AI or ML feature, model, or product. Use when asked to run an AI ethics review, assess AI risks, audit a model for bias, or produce an AI impact assessment. Produces a structured ethics review covering fairness, transparency, privacy, safety, accountability, and societal impact with prioritised mitigations.
- ai-product-canvasStructure AI and ML product decisions with the rigour of any product decision. Use when building AI-powered features, evaluating LLM integrations, designing AI products, or assessing AI readiness. Produces a complete AI product canvas covering problem definition, model approach, data requirements, evaluation framework, UX design, responsible AI checklist, and launch monitoring plan.
- ambiguity-resolverStructure vague opportunities and unclear briefs into actionable one-page problem statements. Use when asked to clarify a vague brief, frame an undefined problem, make sense of an unclear opportunity, or when the user says 'we need to figure out what to do about X' or 'I've been asked to look into Y'. Produces a structured problem brief with reframed questions, scoped boundaries, and a minimum viable research plan.
- api-docs-writerWrite clear, developer-facing API documentation. Use when asked to document an API endpoint, write API reference docs, create a developer guide, or turn a raw spec/Postman collection into documentation. Produces endpoint documentation with descriptions, parameters, request/response examples, and error codes.
- api-versioning-strategyWrite an API versioning strategy document for a service or API platform. Use when asked to define versioning policy, plan API deprecation, classify breaking changes, or document version lifecycle. Produces a complete versioning strategy with breaking-change classification table, deprecation timeline, migration guide template, and client communication template.