database-migration-plan
$
npx mdskill add mohitagw15856/pm-claude-skills/database-migration-planProduce a complete, safe database migration plan for a schema change. A migration plan is not just the SQL — it is a coordinated sequence of steps that ensures the application stays available, data stays consistent, and every step can be rolled back independently.
SKILL.md
.github/skills/database-migration-planView on GitHub ↗
---
name: database-migration-plan
description: "Write a safe, zero-downtime database migration plan for a schema change. Use when asked to plan a database migration, design a zero-downtime schema change, document an expand/contract migration, produce a rollback procedure for a database change, or coordinate a database schema update with a deployment. Produces a structured migration plan covering migration objectives, backward compatibility analysis, expand/contract phase breakdown, exact SQL, rollback steps per phase, data validation queries, and a deployment runbook."
---
# Database Migration Plan Skill
Produce a complete, safe database migration plan for a schema change. A migration plan is not just the SQL — it is a coordinated sequence of steps that ensures the application stays available, data stays consistent, and every step can be rolled back independently.
The expand/contract pattern is the default approach: expand the schema to support both old and new states, migrate the application, then contract to remove the old state. Never combine schema changes and data backfills in a single migration that runs during deployment.
## Required Inputs
Ask for these if not already provided:
- **Current schema state** — the DDL or description of the table(s) as they are now
- **Target schema state** — the DDL or description of what the table(s) should look like after migration
- **Migration reason** — why this change is being made (new feature, performance fix, normalization, compliance)
- **Database engine** — PostgreSQL, MySQL, SQLite, CockroachDB, etc.
- **Estimated data volume** — approximate number of rows in affected tables
- **Deployment constraints** — is any downtime allowed? What is the expected traffic level during migration? Are there multiple app instances running?
- **Rollback window** — how long after deploy can the team roll back before the migration becomes irreversible?
## Output Format
---
# Database Migration Plan: [Migration Name]
**Service:** [Name] | **Team:** [Team name]
**Author:** [Name] | **Reviewed by:** [Name / DBA]
**Date:** [Date] | **Target deploy date:** [Date]
**Database engine:** [PostgreSQL X.X / MySQL X.X]
**Ticket:** [JIRA-XXX]
---
## 1. Migration Overview
**What is changing:**
[1–2 sentences: the specific schema change — e.g. "Adding a non-nullable `organisation_id` column to the `users` table and backfilling it from the `accounts` table."]
**Why:**
[1–2 sentences: the business or technical reason driving the change.]
**Migration type:** [Additive only / Additive + backfill / Column rename / Column type change / Table restructure / Index change]
**Zero-downtime:** [Yes — using expand/contract / No — requires maintenance window — state duration]
**Estimated migration duration:**
- Expand phase: [~X minutes]
- Data backfill: [~X minutes/hours — based on X rows at Y rows/second]
- Contract phase: [~X minutes after app version deployed]
---
## 2. Backward Compatibility Analysis
Before writing a single line of SQL, assess whether each change is backward compatible with the currently deployed application code.
| Change | Backward compatible? | Risk | Notes |
|---|---|---|---|
| [e.g. Add nullable column `org_id`] | Yes | Low | Old app ignores new column |
| [e.g. Backfill `org_id`] | Yes | Medium | Old app unaffected; new app reads backfilled values |
| [e.g. Add NOT NULL constraint to `org_id`] | **No** | High | Old app that inserts without `org_id` will fail |
| [e.g. Drop old column `account_id`] | **No** | High | Old app that reads `account_id` will fail |
| [e.g. Add index on `org_id`] | Yes | Low | Additive; no breaking change |
| [e.g. Rename column] | **No** | High | Never rename in one step; use expand/contract |
**Summary:** [e.g. "This migration requires the expand/contract pattern across 3 deployment phases because steps 3 and 4 are not backward compatible."]
---
## 3. Expand/Contract Phases
### Phase Overview
```
Phase 1 — EXPAND
Deploy migration: add new column (nullable), create new indexes
Old app: continues to work (ignores new column)
New app: not yet deployed
Duration: [~X min] | Rollback: trivial — drop new column
│
▼
Phase 2 — BACKFILL + DUAL-WRITE
Deploy app update: writes to both old and new columns
Run backfill: populate new column for existing rows
Validate: confirm 100% of rows have non-null new column
Duration: [~X hours depending on data volume]
Rollback: deploy previous app version; new column is still nullable
│
▼
Phase 3 — ENFORCE + SWITCH
Deploy migration: add NOT NULL constraint, drop old column/index
Deploy app update: reads only from new column
Duration: [~X min] | Rollback: requires forward-fix (constraint must be dropped first)
│
▼
Phase 4 — CONTRACT (optional cleanup)
Deploy migration: drop deprecated columns, rename if needed
Final state matches target schema
Rollback: not recommended — contract changes are destructive
```
---
### Phase 1 — Expand Schema
**Goal:** Add the new column and structures without breaking the existing application.
**Deploy order:** Run migration first, then (optionally) deploy app.
**Application state:** Old app running; no app changes required yet.
```sql
-- Migration: 001_add_org_id_to_users.sql
BEGIN;
-- Add nullable column (safe — old app ignores it)
ALTER TABLE users
ADD COLUMN org_id UUID NULL
REFERENCES organisations(id) ON DELETE RESTRICT;
-- Add index NOW, not in Phase 3 — building index on large table during Phase 3 is risky
CREATE INDEX CONCURRENTLY users_org_id_idx ON users (org_id);
-- Note: CONCURRENTLY does not lock the table; safe on live traffic
-- Note: Cannot run CONCURRENTLY inside a transaction block; run separately if needed
COMMIT;
```
**Validation after Phase 1:**
```sql
-- Confirm column exists and is nullable
SELECT column_name, data_type, is_nullable
FROM information_schema.columns
WHERE table_name = 'users' AND column_name = 'org_id';
-- Expected: is_nullable = 'YES'
-- Confirm index exists
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'users' AND indexname = 'users_org_id_idx';
```
**Rollback (Phase 1 only):**
```sql
BEGIN;
DROP INDEX CONCURRENTLY IF EXISTS users_org_id_idx;
ALTER TABLE users DROP COLUMN IF EXISTS org_id;
COMMIT;
```
---
### Phase 2 — Backfill Existing Data
**Goal:** Populate the new column for all existing rows before enforcing NOT NULL.
**When to run:** After Phase 1 is live and stable. Can be run as a background job or a one-time script.
**Application state:** Deploy app version that dual-writes to both old and new columns.
**App code change required:**
```
// All INSERT and UPDATE operations must now set BOTH old_column and new_column
// until Phase 3 is complete. This ensures new rows are populated during the backfill window.
```
**Backfill script — batch processing:**
```sql
-- Run in batches to avoid locking. Adjust batch size based on table size and DB load.
-- Target: no single batch takes more than 5 seconds.
DO $$
DECLARE
batch_size INT := 1000;
affected INT;
BEGIN
LOOP
UPDATE users
SET org_id = accounts.organisation_id
FROM accounts
WHERE users.account_id = accounts.id
AND users.org_id IS NULL
LIMIT batch_size;
GET DIAGNOSTICS affected = ROW_COUNT;
EXIT WHEN affected = 0;
-- Pause between batches to avoid saturating I/O
PERFORM pg_sleep(0.1);
END LOOP;
END $$;
```
**Monitoring during backfill:**
```sql
-- Check progress — run periodically during backfill
SELECT
COUNT(*) FILTER (WHERE org_id IS NOT NULL) AS backfilled,
COUNT(*) FILTER (WHERE org_id IS NULL) AS remaining,
COUNT(*) AS total,
ROUND(
100.0 * COUNT(*) FILTER (WHERE org_id IS NOT NULL) / COUNT(*), 2
) AS pct_complete
FROM users;
```
**Backfill completion validation:**
```sql
-- Must return 0 before proceeding to Phase 3
SELECT COUNT(*) AS unbackfilled_rows
FROM users
WHERE org_id IS NULL;
-- Confirm no new rows written without org_id (dual-write working)
SELECT COUNT(*) AS recent_missing
FROM users
WHERE org_id IS NULL
AND created_at > now() - INTERVAL '1 hour';
```
**Rollback (Phase 2 — app only):**
- Deploy previous app version (single-write to old column)
- `org_id` column remains nullable; no data is lost
- Backfilled values remain; harmless
---
### Phase 3 — Enforce Constraints
**Goal:** Add NOT NULL constraint and remove dependency on the old column.
**Prerequisites:** Phase 2 backfill must be 100% complete (zero rows with `org_id IS NULL`).
**Deploy order:** Run migration, then deploy app version that reads only from `org_id`.
**PostgreSQL — use NOT VALID + VALIDATE for large tables:**
```sql
-- Step 1: Add constraint as NOT VALID (no full table scan — instant)
ALTER TABLE users
ADD CONSTRAINT users_org_id_not_null
CHECK (org_id IS NOT NULL) NOT VALID;
-- Step 2: VALIDATE CONSTRAINT (takes a SHARE UPDATE EXCLUSIVE lock — allows reads and writes)
-- Run this separately, as it can take minutes on large tables
ALTER TABLE users
VALIDATE CONSTRAINT users_org_id_not_null;
-- Step 3: Once validated, convert to actual NOT NULL
-- (PostgreSQL trusts the validated check constraint — this is instant)
ALTER TABLE users
ALTER COLUMN org_id SET NOT NULL;
-- Step 4: Drop the now-redundant check constraint
ALTER TABLE users
DROP CONSTRAINT users_org_id_not_null;
```
**Validation after Phase 3:**
```sql
-- Confirm NOT NULL is enforced
SELECT column_name, is_nullable
FROM information_schema.columns
WHERE table_name = 'users' AND column_name = 'org_id';
-- Expected: is_nullable = 'NO'
-- Test that insert without org_id fails (run in a transaction and roll back)
BEGIN;
INSERT INTO users (email) VALUES ('test@example.com');
-- Expected: ERROR: null value in column "org_id" violates not-null constraint
ROLLBACK;
```
**Rollback (Phase 3):**
```sql
-- Drop the NOT NULL constraint (restores nullable state)
ALTER TABLE users ALTER COLUMN org_id DROP NOT NULL;
-- Then deploy previous app version (dual-write)
-- Note: Once app code reading the new column is live, rolling back the constraint
-- without rolling back the app will cause issues — plan this carefully.
```
---
### Phase 4 — Contract (Remove Old Column)
**Goal:** Remove the old column once the app no longer references it.
**Prerequisites:** Phase 3 fully deployed and stable for at least [X days/hours rollback window].
**Warning:** This phase is destructive — the old column's data is permanently deleted.
```sql
BEGIN;
-- Drop the old column
ALTER TABLE users DROP COLUMN account_id;
-- Drop any indexes that referenced the old column
DROP INDEX IF EXISTS users_account_id_idx;
COMMIT;
```
**Pre-drop validation:**
```sql
-- Confirm no application queries still reference the old column
-- (Check this in code review and via a search of the codebase before running)
-- grep -r "account_id" app/
-- Confirm the column is safe to drop
SELECT COUNT(*) FROM users WHERE account_id IS NOT NULL;
-- Should be 0 (or irrelevant once new column is canonical)
```
**Rollback:** Not straightforward — dropped column data cannot be recovered. Only proceed to Phase 4 after the rollback window has passed and the change is confirmed stable.
---
## 4. Data Validation Plan
Run these queries before and after the full migration to confirm data integrity.
**Pre-migration baseline:**
```sql
-- Record these values before any migration step
SELECT COUNT(*) AS total_users FROM users;
SELECT COUNT(*) AS total_orgs FROM organisations;
SELECT MIN(created_at), MAX(created_at) FROM users;
-- Check for any anomalies in the source data before backfill
SELECT COUNT(*) AS users_without_account
FROM users WHERE account_id IS NULL;
```
**Post-backfill integrity check:**
```sql
-- All users have an org that exists
SELECT COUNT(*) AS orphaned_org_refs
FROM users u
WHERE u.org_id IS NOT NULL
AND NOT EXISTS (
SELECT 1 FROM organisations o WHERE o.id = u.org_id
);
-- Expected: 0
-- org_id matches expected value from source column
SELECT COUNT(*) AS mismatched_backfill
FROM users u
JOIN accounts a ON u.account_id = a.id
WHERE u.org_id != a.organisation_id;
-- Expected: 0
-- Row count unchanged (no rows created or deleted by migration)
SELECT COUNT(*) AS total_users_after FROM users;
-- Must match pre-migration baseline
```
**Post-contract final check:**
```sql
-- Old column is gone
SELECT COUNT(*) FROM information_schema.columns
WHERE table_name = 'users' AND column_name = 'account_id';
-- Expected: 0
-- New column is NOT NULL
SELECT is_nullable FROM information_schema.columns
WHERE table_name = 'users' AND column_name = 'org_id';
-- Expected: NO
```
---
## 5. Performance Impact Assessment
| Step | Lock type | Lock duration | Traffic impact |
|---|---|---|---|
| Add nullable column | ACCESS EXCLUSIVE | Milliseconds | Negligible |
| CREATE INDEX CONCURRENTLY | SHARE UPDATE EXCLUSIVE | Minutes (proportional to table size) | Reads and writes continue |
| Batch backfill | Row-level locks only | <5s per batch | Low if batches are small |
| ADD CONSTRAINT NOT VALID | ACCESS EXCLUSIVE | Milliseconds | Negligible |
| VALIDATE CONSTRAINT | SHARE UPDATE EXCLUSIVE | Minutes | Reads and writes continue |
| ALTER COLUMN SET NOT NULL | ACCESS EXCLUSIVE | Milliseconds (if check constraint validated) | Negligible |
| DROP COLUMN | ACCESS EXCLUSIVE | Milliseconds | Negligible |
**Expected load increase during backfill:**
- DB CPU: [estimated % increase during batch writes]
- DB I/O: [estimated increase]
- Monitoring threshold to pause backfill: [e.g. DB CPU > 80% for >2 minutes]
**Backfill rate estimate:**
- Table size: [X million rows]
- Batch size: [1000 rows]
- Pause between batches: [100ms]
- Estimated total duration: [X hours at Y rows/second]
---
## 6. Deployment Runbook
Follow this checklist on the day of migration. Mark each step as done before proceeding.
**Pre-migration (day before):**
- [ ] DBA / tech lead has reviewed the migration plan
- [ ] Performance impact assessed; monitoring dashboards ready
- [ ] Backfill script tested on a staging DB with production-scale data
- [ ] Rollback procedure tested on staging
- [ ] On-call engineer briefed; Slack channel [#db-migrations] set up for coordination
- [ ] Maintenance window scheduled (if required)
**Phase 1 — Expand (T+0):**
- [ ] Take a manual DB snapshot / verify automated backup is recent
- [ ] Run `001_expand_add_org_id.sql` on production
- [ ] Run Phase 1 validation queries — confirm pass
- [ ] Deploy app version with dual-write
- [ ] Monitor error rate for [10 minutes]
**Phase 2 — Backfill (T+[X hours]):**
- [ ] Confirm Phase 1 has been stable for [X hours]
- [ ] Start backfill script in a screen/tmux session
- [ ] Monitor progress via backfill progress query every [5 minutes]
- [ ] Monitor DB CPU and I/O — pause if thresholds exceeded
- [ ] Run completion validation — confirm 0 unbackfilled rows
- [ ] Run integrity checks — confirm 0 orphaned refs, 0 mismatches
**Phase 3 — Enforce (T+[X days]):**
- [ ] Confirm backfill 100% complete and stable for [X hours]
- [ ] Add NOT VALID constraint
- [ ] Run VALIDATE CONSTRAINT (monitor duration and lock waits)
- [ ] Alter column to NOT NULL
- [ ] Run Phase 3 validation queries
- [ ] Deploy app version reading only from new column
- [ ] Monitor error rate for [30 minutes]
**Phase 4 — Contract (T+[X days after rollback window]):**
- [ ] Confirm rollback window has passed — no incidents, no rollback needed
- [ ] Search codebase for references to old column — confirm zero
- [ ] Run DROP COLUMN migration
- [ ] Run final integrity checks
- [ ] Close migration ticket; update schema documentation
---
## Quality Checks
- [ ] Every migration phase has an independent rollback procedure — no phase assumes the next one has run
- [ ] Batch backfill script includes a pause between batches to avoid saturating I/O
- [ ] NOT NULL constraints use the NOT VALID + VALIDATE pattern on tables with >100k rows
- [ ] The app dual-write period is explicitly defined — old column writes are not dropped until Phase 3 is deployed
- [ ] Data validation queries include a row count check to confirm no data loss
- [ ] Lock types are identified for every DDL statement — no "should be fine" assumptions
- [ ] The deployment runbook names who runs each step, not just what to run
- [ ] Phase 4 (contract) is explicitly gated on the rollback window passing — not run on the same day as Phase 3
More from mohitagw15856/pm-claude-skills
- 360-feedback-templateDesign a 360-degree feedback survey or write a structured 360 feedback report. Use when asked to build a 360 feedback process, write 360 feedback for a colleague, design a feedback survey, or produce a feedback report. Produces either a complete survey instrument with rating scales and open-ended questions, or a structured narrative feedback report with themes, strengths, and development areas.
- ab-test-plannerDesign statistically rigorous A/B tests for product features, UI changes, onboarding flows, and pricing experiments. Use when asked to set up an experiment, design an A/B test, calculate sample size, or interpret test results. Produces a complete test plan with hypothesis, variant definitions, sample size, duration estimate, guardrail metrics, and a results interpretation guide.
- accessibility-auditGenerate a WCAG 2.2 accessibility audit checklist and remediation suggestions for any UI or design. Use when asked to audit for accessibility, check WCAG compliance, review a design for a11y issues, or create an accessibility remediation plan. Produces a prioritised checklist with pass/fail assessments and specific fixes.
- account-planBuild a structured account plan for any key customer or target account. Use when asked to create an account plan, key account strategy, strategic account review, or territory plan. Produces a complete account plan with relationship map, growth opportunities, risks, and 90-day action plan.
- aeo-optimizerOptimize an article for Answer Engine Optimization (AEO) — restructuring content so AI engines like ChatGPT, Perplexity, and Claude can extract, quote, and cite it. Rewrites headings as questions, drops 50-80 word answer capsules, audits paragraph length, and flags trust signals. Use when asked to AEO-optimize, make content AI-readable, improve AI citation chances, or adapt an article for answer engines.
- ai-ethics-reviewConduct an ethical review of an AI or ML feature, model, or product. Use when asked to run an AI ethics review, assess AI risks, audit a model for bias, or produce an AI impact assessment. Produces a structured ethics review covering fairness, transparency, privacy, safety, accountability, and societal impact with prioritised mitigations.
- ai-product-canvasStructure AI and ML product decisions with the rigour of any product decision. Use when building AI-powered features, evaluating LLM integrations, designing AI products, or assessing AI readiness. Produces a complete AI product canvas covering problem definition, model approach, data requirements, evaluation framework, UX design, responsible AI checklist, and launch monitoring plan.
- ambiguity-resolverStructure vague opportunities and unclear briefs into actionable one-page problem statements. Use when asked to clarify a vague brief, frame an undefined problem, make sense of an unclear opportunity, or when the user says 'we need to figure out what to do about X' or 'I've been asked to look into Y'. Produces a structured problem brief with reframed questions, scoped boundaries, and a minimum viable research plan.
- api-docs-writerWrite clear, developer-facing API documentation. Use when asked to document an API endpoint, write API reference docs, create a developer guide, or turn a raw spec/Postman collection into documentation. Produces endpoint documentation with descriptions, parameters, request/response examples, and error codes.
- api-versioning-strategyWrite an API versioning strategy document for a service or API platform. Use when asked to define versioning policy, plan API deprecation, classify breaking changes, or document version lifecycle. Produces a complete versioning strategy with breaking-change classification table, deprecation timeline, migration guide template, and client communication template.