runbook-writer
$
npx mdskill add mohitagw15856/pm-claude-skills/runbook-writerProduces operational runbooks for services, incident types, and deployment procedures — structured so an on-call engineer who's never touched the system can follow them under pressure.
SKILL.md
.github/skills/runbook-writerView on GitHub ↗
--- name: runbook-writer description: "Write an operational runbook for a service, incident type, or deployment procedure. Use when asked to write a runbook, create an ops guide, document an operational procedure, or prepare an incident response playbook. Produces a runbook with overview, prerequisites, step-by-step procedures, rollback steps, troubleshooting table, and escalation paths." --- # Runbook Writer Skill Produces operational runbooks for services, incident types, and deployment procedures — structured so an on-call engineer who's never touched the system can follow them under pressure. ## Required Inputs Ask for these if not provided: - **What the runbook is for** (e.g. deploying the payment service, responding to a database failover, rotating API keys) - **Runbook type** (Deployment / Incident Response / Maintenance / Disaster Recovery) - **System/service name and what it does** (brief description) - **Audience** (new on-call engineers / experienced SREs / DevOps team) - **Tech stack** (where relevant — e.g. Kubernetes, AWS RDS, Node.js) - **Monitoring tools** (e.g. Grafana, Datadog, CloudWatch, Splunk — used to name specific dashboards and alert links in the steps) - **Key environment details** (e.g. Kubernetes cluster name, AWS account/region, relevant namespaces or resource names — paste what's relevant for exact commands) ## Output Format --- **Runbook:** [Runbook Title] **Service:** [Service Name] **Type:** [Deployment / Incident Response / Maintenance / DR] **Last Updated:** [Insert today's date in YYYY-MM-DD format] **Owner:** [Team or person] **Severity:** [P1 / P2 / P3 — if incident-type] --- ### Overview **What this runbook covers:** [1–2 sentences on the scenario this runbook handles] **When to use this runbook:** - [Specific trigger condition 1 — e.g. PagerDuty alert: `high-error-rate-payment-service`] - [Specific trigger condition 2 — e.g. Deploy needed after PR merged to `main`] **Estimated time to complete:** [X minutes / X–Y minutes depending on outcome] **Impact if not completed correctly:** [e.g. Payment processing degraded / Data loss risk / Users locked out] --- ### Prerequisites **Access required:** - [ ] [System/tool access — e.g. AWS Console: `production-account`] - [ ] [Credential — e.g. `vault read secret/payment-service`] - [ ] [VPN / bastion access if needed] **Tools required:** - [ ] [Tool name and version — e.g. `kubectl` v1.28+] - [ ] [CLI or dashboard name] **Before you start:** - [ ] [Prerequisite check — e.g. Verify current deployment is healthy in Grafana] - [ ] [Prerequisite action — e.g. Announce in `#ops-live` that you're starting] --- ### Procedure Number every step. Use exact commands. Do not paraphrase tool names or flags. **Step 1: [Action name]** [What you're doing and why — one sentence] ```bash # Exact command [command here] ``` **Expected output:** `[what should appear if this worked]` **If this fails:** [Exact error message to look for] → [What to do, or see Troubleshooting] **Step 2: [Action name]** [Same structure as Step 1] **Step 3: Verify** Always include a verification step after the main procedure: ```bash [verification command] ``` **Expected state:** [What a healthy system looks like after this runbook completes] --- ### Rollback How to undo this procedure if something went wrong: **Step R1: [Rollback action]** ```bash [rollback command] ``` **Verify rollback:** `[command to confirm rollback succeeded]` --- ### Troubleshooting | Symptom | Likely Cause | Resolution | |---|---|---| | [Error message or observable symptom] | [Why this happens] | [Exact fix or next step] | | [Another symptom] | [Cause] | [Resolution] | --- ### Escalation If this runbook does not resolve the issue: | Condition | Who to Contact | How | |---|---|---| | [e.g. DB unavailable after 10 min] | [DBA on-call] | [PagerDuty policy: `db-oncall`] | | [e.g. Payment provider unresponsive] | [Vendor contact] | [Contact in 1Password: `vendor-escalation`] | **Always update the incident timeline in [tool] before escalating.** --- ### Post-Procedure Checklist After completing the runbook: - [ ] Announce completion in `#ops-live` with outcome - [ ] Update the incident ticket / deploy log - [ ] Verify alerts have resolved in monitoring dashboard - [ ] If this revealed a gap in this runbook — update it now (link to edit process) --- ## Quality Checks - [ ] Every step has an exact command (no "run the deploy script") - [ ] Expected output is specified for each step so engineer knows if it worked - [ ] Failure path is explicit for each step (not "if it fails, investigate") - [ ] Rollback procedure is complete and independently testable - [ ] Escalation table has no cells containing only "[Team name]" — every row must either have a real contact or be explicitly flagged as [FILL IN: on-call rotation link] - [ ] Rollback section contains at least one concrete command (not left as "[rollback command]" placeholder) - [ ] Runbook can be followed by someone who has never touched this system ## Usage Examples - "Write a runbook for [service] deployment" - "Create an incident response runbook for [alert type]" - "I need a runbook for [procedure]" - "Document the operational procedure for [X]" - "Write an ops playbook for [scenario]"
More from mohitagw15856/pm-claude-skills
- 360-feedback-templateDesign a 360-degree feedback survey or write a structured 360 feedback report. Use when asked to build a 360 feedback process, write 360 feedback for a colleague, design a feedback survey, or produce a feedback report. Produces either a complete survey instrument with rating scales and open-ended questions, or a structured narrative feedback report with themes, strengths, and development areas.
- ab-test-plannerDesign statistically rigorous A/B tests for product features, UI changes, onboarding flows, and pricing experiments. Use when asked to set up an experiment, design an A/B test, calculate sample size, or interpret test results. Produces a complete test plan with hypothesis, variant definitions, sample size, duration estimate, guardrail metrics, and a results interpretation guide.
- accessibility-auditGenerate a WCAG 2.2 accessibility audit checklist and remediation suggestions for any UI or design. Use when asked to audit for accessibility, check WCAG compliance, review a design for a11y issues, or create an accessibility remediation plan. Produces a prioritised checklist with pass/fail assessments and specific fixes.
- account-planBuild a structured account plan for any key customer or target account. Use when asked to create an account plan, key account strategy, strategic account review, or territory plan. Produces a complete account plan with relationship map, growth opportunities, risks, and 90-day action plan.
- aeo-optimizerOptimize an article for Answer Engine Optimization (AEO) — restructuring content so AI engines like ChatGPT, Perplexity, and Claude can extract, quote, and cite it. Rewrites headings as questions, drops 50-80 word answer capsules, audits paragraph length, and flags trust signals. Use when asked to AEO-optimize, make content AI-readable, improve AI citation chances, or adapt an article for answer engines.
- ai-ethics-reviewConduct an ethical review of an AI or ML feature, model, or product. Use when asked to run an AI ethics review, assess AI risks, audit a model for bias, or produce an AI impact assessment. Produces a structured ethics review covering fairness, transparency, privacy, safety, accountability, and societal impact with prioritised mitigations.
- ai-product-canvasStructure AI and ML product decisions with the rigour of any product decision. Use when building AI-powered features, evaluating LLM integrations, designing AI products, or assessing AI readiness. Produces a complete AI product canvas covering problem definition, model approach, data requirements, evaluation framework, UX design, responsible AI checklist, and launch monitoring plan.
- ambiguity-resolverStructure vague opportunities and unclear briefs into actionable one-page problem statements. Use when asked to clarify a vague brief, frame an undefined problem, make sense of an unclear opportunity, or when the user says 'we need to figure out what to do about X' or 'I've been asked to look into Y'. Produces a structured problem brief with reframed questions, scoped boundaries, and a minimum viable research plan.
- api-docs-writerWrite clear, developer-facing API documentation. Use when asked to document an API endpoint, write API reference docs, create a developer guide, or turn a raw spec/Postman collection into documentation. Produces endpoint documentation with descriptions, parameters, request/response examples, and error codes.
- api-versioning-strategyWrite an API versioning strategy document for a service or API platform. Use when asked to define versioning policy, plan API deprecation, classify breaking changes, or document version lifecycle. Produces a complete versioning strategy with breaking-change classification table, deprecation timeline, migration guide template, and client communication template.