developer-onboarding-doc

$npx mdskill add mohitagw15856/pm-claude-skills/developer-onboarding-doc

Produce a complete developer onboarding document for a service or team — covering everything a new engineer needs to be productive within their first week.

SKILL.md
.github/skills/developer-onboarding-docView on GitHub ↗
---
name: developer-onboarding-doc
description: "Write a developer onboarding document for a service, codebase, or team. Use when asked to write a developer guide, service README, onboarding doc for a new engineer, codebase orientation, or getting-started guide for a technical team. Produces a structured doc covering service overview, architecture, local setup, key patterns, testing, deployment, and who to ask for what."
---

# Developer Onboarding Document Skill

Produce a complete developer onboarding document for a service or team — covering everything a new engineer needs to be productive within their first week.

A good onboarding doc is not a wiki dump. It answers the questions a new engineer actually has on day one, in the order they'll have them.

## Required Inputs

Ask for these if not already provided:
- **Service name** and what it does
- **Team** responsible for it
- **Tech stack** — language(s), framework(s), database(s), message queues, etc.
- **Key external dependencies** — upstream services, third-party APIs
- **Deployment target** — Kubernetes, ECS, Lambda, bare metal, etc.
- **Local dev setup** — how to run locally (Docker Compose, local DB, etc.)
- **Testing approach** — unit, integration, E2E; test commands
- **Deployment process** — summary of how code gets to production
- **On-call setup** — who's on-call, how alerts work
- **Contacts** — tech lead, platform team, related service owners

## Output Format

---

# Developer Onboarding: [Service Name]

**Team:** [Team name] | **Tech lead:** [Name]
**Last updated:** [Date] | **Updated by:** [Name]

> If something in this doc is wrong or out of date, fix it now — it will affect every engineer who onboards after you.

---

## What This Service Does

[3–5 sentences. What problem does this service solve? Who calls it, and who does it call? What would break if this service went down?]

**Service type:** [API / Background worker / Event consumer / Data pipeline / etc.]
**Consumers:** [List internal services or external clients that depend on this service]
**Dependencies:** [List upstream services, databases, and third-party APIs this service calls]

**Architecture diagram:** [Link or embed — even a rough ASCII diagram helps]

```
[Caller A] ──→ [This Service] ──→ [Database]
                      │
                      └──→ [Downstream Service]
```

---

## Codebase Orientation

**Repository:** [Link]
**Main branch:** `[main / master]`
**Language:** [e.g. Go 1.22 / Node.js 20 / Python 3.12]
**Framework:** [e.g. Express / FastAPI / Gin / Rails]

### Key directories

```
[repo-root]/
├── [src/ or cmd/]          # Application code
│   ├── [handlers/]         # HTTP handlers / controllers
│   ├── [services/]         # Business logic
│   ├── [repository/]       # Database access layer
│   └── [models/]           # Data models / types
├── [tests/]                # Test files
├── [migrations/]           # Database migrations
├── [scripts/]              # Utility scripts
├── [.github/workflows/]    # CI/CD pipeline definitions
└── [docs/]                 # Additional documentation
```

**Where to start reading:** [Point to 2–3 key files that give the best orientation — e.g. `main.go`, `routes.js`, `app.py`]

### Things that might surprise you

- [Unusual pattern 1 — e.g. "We use event sourcing — state is derived from an event log, not stored directly"]
- [Unusual pattern 2 — e.g. "Auth is handled by the gateway — this service trusts the `X-User-Id` header"]
- [Unusual pattern 3 — any non-obvious decisions or legacy choices]

---

## Local Development Setup

**Estimated setup time:** [X minutes for a fresh machine]

### Prerequisites

- [ ] [Tool 1] — version [X] — [install link]
- [ ] [Tool 2] — version [X] — [install link]
- [ ] Access to [repo / internal package registry] — request from [who]
- [ ] [Any secrets or credentials needed] — request from [who]

### Step-by-step setup

```bash
# 1. Clone the repo
git clone [repo URL]
cd [repo-name]

# 2. Copy and configure environment variables
cp .env.example .env
# Edit .env — see "Environment Variables" section below

# 3. Start dependencies (database, cache, etc.)
[docker compose up -d / make deps / etc.]

# 4. Install dependencies
[npm install / go mod download / pip install -r requirements.txt]

# 5. Run database migrations
[migration command]

# 6. Start the service
[start command]

# 7. Verify it's working
curl http://localhost:[PORT]/health
# Expected: {"status":"ok"}
```

**If this doesn't work:** Check [Troubleshooting section below] or ask in `#[channel]`.

### Environment Variables

| Variable | Required | Description | Example |
|---|---|---|---|
| `DATABASE_URL` | Yes | Connection string for the primary DB | `postgres://localhost:5432/[db]` |
| `[VAR_2]` | Yes | [Description] | [Example] |
| `[VAR_3]` | No | [Description — default value] | [Example] |

**Secrets for local dev:** [Where to get them — e.g. "Run `[command]` to pull from Vault" or "Ask [person] in #[channel]"]

### Useful local commands

```bash
[start command]           # Start the service
[test command]            # Run all tests
[lint command]            # Run linter
[format command]          # Format code
[migration command]       # Run pending migrations
[seed command]            # Seed local database
```

---

## Testing

**Testing philosophy:** [e.g. "We test at the integration layer — unit tests for pure functions, integration tests for anything touching the DB or external services"]

### Running tests

```bash
# All tests
[test command]

# Unit tests only
[unit test command]

# Integration tests (requires local deps running)
[integration test command]

# A specific test file or test case
[test command with filter]
```

**Test coverage:** [X]% (minimum required to pass CI: [Y]%)
**Coverage report:** [Where to find it]

### Writing tests

- **Unit tests:** [Where to put them — e.g. alongside source files as `*_test.go`]
- **Integration tests:** [Where to put them — e.g. `tests/integration/`]
- **Test database:** [How it works — e.g. "Each test gets a clean transaction that rolls back on teardown — see `tests/helpers/db.go`"]
- **Mocking:** [Policy — e.g. "We mock at the repository layer — don't mock the DB directly"]

---

## Making Changes

### Branching

[Branch naming convention — e.g. `feature/[ticket-id]-short-description`, `fix/[ticket-id]-short-description`]

### Before opening a PR

- [ ] Tests pass locally
- [ ] Linter passes (`[lint command]`)
- [ ] New behaviour has test coverage
- [ ] Any new environment variables are added to `.env.example` and documented
- [ ] Database migrations are backward-compatible (old code can run against new schema)

### Code review

- **Reviewers:** [Who to request review from — e.g. "Any engineer on [team]; lead review required for auth changes"]
- **Expected review time:** [X hours / 1 business day]
- **PR template:** [Link or auto-generated by GitHub]

### Database migrations

```bash
# Create a new migration
[migration create command]

# Apply pending migrations
[migration up command]

# Roll back last migration
[migration down command]
```

**Migration rules:**
- All migrations must be backward-compatible — old code must run against the new schema
- Never rename or drop a column in a single migration — do it in two steps (add new, migrate data, drop old)
- Test your rollback before merging

---

## Deployment

**How code gets to production:** [1–2 sentence summary — link to full CI/CD playbook if it exists]

1. Merge to `main` → automatic deploy to staging
2. Smoke tests run on staging
3. Manual approval → deploy to production
4. Post-deploy monitoring for [X minutes]

**Deployment docs:** [Link to CI/CD playbook or pipeline docs]

**Who can deploy:** [Any engineer / Lead engineer / On-call engineer — specify]

**Deployment channel:** `#[deployments channel]`

---

## Monitoring and Observability

**Dashboard:** [Datadog / Grafana / CloudWatch — link]
**Logs:** [Log aggregation tool and link — e.g. "Logs are in Datadog under service:[name]"]
**Traces:** [Tracing tool and link if applicable]
**Alerts:** [Where alerts fire — e.g. PagerDuty / Slack #alerts-[service]]

**Key metrics to know:**
- **Error rate:** Should be <[X]% (alert at [Y]%)
- **P99 latency:** Should be <[X]ms
- **[Business metric]:** [e.g. "Queue depth should be <100 items"]

---

## On-Call

**On-call schedule:** [PagerDuty / Opsgenie link]
**Who's on-call now:** [Link to current schedule or `#oncall` channel]
**Escalation:** [On-call → [team lead] → [EM] — after [X] minutes unacknowledged]

**If you get paged:**
1. Acknowledge the alert
2. Check [dashboard link] for the first clue
3. Common alert runbooks: [link to oncall-runbook or runbook-writer output]
4. If you can't resolve in [X minutes], escalate to [person/channel]

---

## Key Contacts

| Role | Name | Best way to reach |
|---|---|---|
| Tech lead | [Name] | Slack: @[handle] |
| On-call rotation | [Team] | PagerDuty / `#on-call` |
| Platform / infra | [Team] | `#platform` Slack channel |
| Database / DBA | [Name or team] | `#database` Slack channel |
| [Upstream service] owner | [Name] | Slack: @[handle] |

**Where to ask questions:**
- General engineering: `#engineering`
- This service specifically: `#[service-name]`
- Urgent / production issues: `#incidents`

---

## Troubleshooting

### "The service won't start locally"

1. Check that Docker / dependencies are running: `[command]`
2. Check `.env` is populated — missing values cause silent failures
3. Check logs: `[log command]`
4. Ask in `#[channel]`

### "Tests are failing locally but passing in CI"

- Check your local dependency versions match CI: `[version check command]`
- Try a clean install: `[clean install command]`
- Integration tests need local deps running — `[start deps command]`

### "I can't access [internal tool / system]"

- Request access through [process — e.g. Okta self-serve / ask your manager]

### "Something looks wrong in production"

1. Check [dashboard] for the error spike
2. Check recent deploys in `#deployments`
3. If it's an active incident, page on-call via [PagerDuty / Slack command]

---

## Further Reading

- [Architecture Decision Records (ADRs)](./docs/decisions/) — why the codebase is the way it is
- [API documentation](./docs/api/) or [link to external docs]
- [Incident runbooks](./docs/runbooks/)
- [CI/CD pipeline documentation](./docs/cicd/)
- [Team working agreements](./docs/team/)

---

## Quality Checks

- [ ] Local setup instructions work on a fresh machine — tested recently
- [ ] Environment variables table is complete and accurate
- [ ] "Things that might surprise you" captures the actual surprises (ask a recent joiner)
- [ ] On-call section has real links, not placeholders
- [ ] Contacts are current — team members with real Slack handles
- [ ] Troubleshooting covers the top 3 actual questions new joiners ask
More from mohitagw15856/pm-claude-skills