legacy-code-safety

Name: legacy-code-safety
Author: TheBushidoCollective/han

$npx mdskill add TheBushidoCollective/han/legacy-code-safety

Safeguard changes to untested code by enforcing characterization testing before modification.

Mitigates risks associated with altering undocumented or untested source files.
Utilizes file system interaction tools like Read, Write, and Edit for code manipulation.
Guides the agent through the RGR workflow to establish behavioral baselines.
Outputs structured advice emphasizing the necessity of adding tests prior to any code change.

SKILL.md

.github/skills/legacy-code-safetyView on GitHub ↗

---
name: legacy-code-safety
user-invocable: false
description: Use when modifying, removing, or refactoring code that lacks test coverage. Emphasizes the danger of untested changes and the RGR workflow to add characterization tests before modifications.
allowed-tools:
  - Read
  - Write
  - Edit
  - Bash
  - Grep
  - Glob
---

# Legacy Code Safety

> "Changing untested code is like performing surgery blindfolded."

## DANGER: The Risk of Modifying Untested Code

**Modifying code without tests is one of the HIGHEST-RISK activities in software development.**

When you change untested code:

- You CANNOT know what behaviors you're breaking
- You have NO safety net to catch mistakes
- You're relying on hope instead of evidence
- You're creating technical debt on top of technical debt

**In dynamically-typed languages (Python, JavaScript, Ruby, PHP), this risk is EXPONENTIALLY higher:**

- No compiler to catch type errors
- No static analysis to verify correctness
- Tests are your ONLY safety mechanism
- Runtime is when you discover your mistakes

## The Golden Rule

**NEVER modify untested code without adding tests first.**

If code lacks tests:

1. STOP
2. Add characterization tests
3. Verify tests pass
4. THEN make your changes

## The RGR Workflow for Legacy Code

This is NOT standard TDD. This is **characterization testing** - capturing current behavior before changing it.

### Standard TDD (New Features)

1. RED: Write failing test for NEW behavior
2. GREEN: Implement to make test pass
3. REFACTOR: Clean up implementation

### Legacy Code RGR (Existing Code)

1. RED: Write test that SHOULD pass (captures current behavior)
2. GREEN: Verify it passes (confirms you captured reality correctly)
3. REFACTOR: NOW you can safely modify with a safety net

### The Critical Difference

**TDD:** You know what the behavior SHOULD be, write test for ideal

**Legacy RGR:** You DON'T know all behaviors, capture what EXISTS

### Why "RED" for Passing Tests?

The "RED" step means "you're in the red zone" - working with dangerous untested code. Even though you expect the test to pass, you're at risk until you have that passing test as proof.

## Characterization Testing Process

### Step 1: Identify All Behaviors

Before writing tests, understand what the code currently does:

```typescript
// Legacy function - no tests
function calculateShipping(order) {
  let cost = 9.99

  if (order.total > 100) {
    cost = 0
  }

  if (order.weight > 50) {
    cost += 25
  }

  if (order.destination === 'HI' || order.destination === 'AK') {
    cost += 15
  }

  return cost
}
```

**Questions to answer:**

- What inputs does it accept?
- What outputs does it produce?
- What are the edge cases?
- Are there side effects?
- What happens with invalid input?

### Step 2: Write Tests for Current Behavior

**Document what EXISTS, not what should exist:**

```typescript
describe('calculateShipping (characterization)', () => {
  // Capture base case
  test('standard shipping', () => {
    const order = { total: 50, weight: 10, destination: 'CA' }
    expect(calculateShipping(order)).toBe(9.99)
  })

  // Capture free shipping threshold
  test('free shipping over $100', () => {
    const order = { total: 150, weight: 10, destination: 'CA' }
    expect(calculateShipping(order)).toBe(0)
  })

  // Capture weight surcharge
  test('heavy package surcharge', () => {
    const order = { total: 50, weight: 75, destination: 'CA' }
    expect(calculateShipping(order)).toBe(34.99) // 9.99 + 25
  })

  // Capture regional surcharge
  test('Hawaii surcharge', () => {
    const order = { total: 50, weight: 10, destination: 'HI' }
    expect(calculateShipping(order)).toBe(24.99) // 9.99 + 15
  })

  // Discovered edge case: free shipping + weight
  test('free shipping with heavy package', () => {
    const order = { total: 150, weight: 75, destination: 'CA' }
    expect(calculateShipping(order)).toBe(25) // 0 + 25 (is this right?)
  })

  // Discovered edge case: all conditions
  test('all surcharges with free shipping threshold', () => {
    const order = { total: 150, weight: 75, destination: 'HI' }
    expect(calculateShipping(order)).toBe(40) // 0 + 25 + 15
  })
})
```

### Step 3: Verify Tests Pass

**These tests MUST pass immediately:**

```bash
npm test
```

If they don't pass:

- You misunderstood the current behavior
- Fix the test, not the code
- The test should document reality

### Step 4: NOW You Can Refactor

**With tests in place, you have a safety net:**

```typescript
// NOW we can safely refactor
function calculateShipping(order: Order): number {
  const BASE_COST = 9.99
  const FREE_SHIPPING_THRESHOLD = 100
  const HEAVY_PACKAGE_WEIGHT = 50
  const HEAVY_PACKAGE_SURCHARGE = 25
  const REMOTE_DESTINATIONS = ['HI', 'AK']
  const REMOTE_SURCHARGE = 15

  let cost = order.total > FREE_SHIPPING_THRESHOLD ? 0 : BASE_COST

  if (order.weight > HEAVY_PACKAGE_WEIGHT) {
    cost += HEAVY_PACKAGE_SURCHARGE
  }

  if (REMOTE_DESTINATIONS.includes(order.destination)) {
    cost += REMOTE_SURCHARGE
  }

  return cost
}
```

**Run tests again:**

```bash
npm test  # All tests still pass
```

You've improved the code WITHOUT breaking existing behavior.

## When This Skill Applies

Use this skill when:

- [ ] Deleting code that has no tests
- [ ] Modifying functions without test coverage
- [ ] Refactoring untested modules
- [ ] Changing behavior in legacy systems
- [ ] Test coverage is below 80% for code you're touching
- [ ] You inherit code from previous developers
- [ ] You're working in a codebase with poor test discipline

**Even if the change seems "simple", if there are no tests, use this workflow.**

## Language-Specific Risk Levels

### HIGH RISK: Dynamic/Weakly-Typed Languages

**Python, JavaScript, Ruby, PHP:**

- NO compile-time type checking
- NO static analysis of type correctness
- Tests are your ONLY safety net
- Runtime is when errors appear
- User discovers bugs, not compiler

**Extra caution required:**

- Write MORE tests
- Test edge cases thoroughly
- Check for None/null/undefined
- Verify types explicitly
- Add type hints/annotations when possible

### MEDIUM RISK: Static/Strongly-Typed Languages

**Go, Rust, TypeScript (strict mode), Java:**

- Type system catches some errors
- Compiler validates correctness
- Still need tests for BEHAVIOR
- Types don't test business logic

**Type safety helps, but:**

- Types verify structure, not logic
- Tests still required for correctness
- Don't rely solely on type system

## The Strangler Fig Pattern

For large legacy systems, don't try to test everything at once:

### What Is It?

Named after the strangler fig vine that grows around a tree:

1. Vine grows alongside tree
2. Eventually replaces tree entirely
3. Original tree can be removed

Applied to code:

1. Wrap legacy code with new interface
2. Add tests to the wrapper
3. Gradually move logic to new code
4. Eventually remove legacy code

### Implementation

**Phase 1: Wrap**

```typescript
// Legacy code (untested, scary)
function legacyCalculatePrice(item) {
  // ... 500 lines of spaghetti ...
}

// New wrapper (tested)
function calculatePrice(item: Item): number {
  // Add validation and tests here
  validateItem(item)

  // Call legacy for now
  return legacyCalculatePrice(item)
}
```

**Phase 2: Test the Boundary**

```typescript
describe('calculatePrice wrapper', () => {
  test('validates input', () => {
    expect(() => calculatePrice(null)).toThrow()
  })

  test('handles normal items', () => {
    const item = { id: 1, price: 10 }
    expect(calculatePrice(item)).toBe(10)
  })

  // Characterization tests for legacy behavior
  test('matches legacy for discounts', () => {
    const item = { id: 1, price: 10, discount: 0.2 }
    expect(calculatePrice(item)).toBe(8)
  })
})
```

**Phase 3: Replace Internals**

```typescript
function calculatePrice(item: Item): number {
  validateItem(item)

  // New implementation (tested, clean)
  const basePrice = item.price
  const discount = item.discount ?? 0
  return basePrice * (1 - discount)

  // Legacy code removed!
}
```

**Phase 4: Expand**

Now that one function is safe, repeat for next function.

### Benefits

- Incremental progress
- Always have working system
- Tests added gradually
- Risk minimized
- Can stop anytime

### When to Use

- Large untested codebase
- Can't stop for months to add tests
- Need to deliver features while improving
- High-risk production system

## Pre-Modification Checklist

Before touching ANY untested code:

- [ ] Identified all entry points (who calls this code?)
- [ ] Identified all exit points (what does this code call?)
- [ ] Understood current behavior (what does it do now?)
- [ ] Written characterization tests for current behavior
- [ ] All characterization tests pass
- [ ] Documented any surprising behaviors found
- [ ] Considered using Strangler Fig for large changes

**ONLY AFTER ALL CHECKS PASS:**

- [ ] Make your intended changes
- [ ] Run tests (verify behavior preserved or intentionally changed)
- [ ] Add tests for NEW behavior if applicable
- [ ] Document breaking changes if any

## Red Flags

Stop immediately if:

- "I'll just make a quick change, no need for tests"
- "It's a small change, what could go wrong?"
- "I don't have time to write tests"
- "The code is too complex to test"
- "I'll add tests later" (you won't)

**These are rationalizations for dangerous behavior.**

## Common Objections (and Responses)

### "But tests will take longer than the change!"

**Response:** The initial change is fast. Finding and fixing the bugs you introduced takes much longer. Tests save time.

### "The code is too complex to test!"

**Response:** If it's too complex to test, it's too complex to change safely. Simplify first, or use Strangler Fig pattern.

### "We need to ship this feature NOW!"

**Response:** Shipping a broken feature is slower than shipping a tested feature late. Technical debt compounds.

### "This code has worked fine without tests for years!"

**Response:** Until now. The moment you change it, that safety is gone. Past stability doesn't predict future stability.

### "I understand the code, I won't break it!"

**Response:** Confidence is not a substitute for evidence. Even experts make mistakes.

## Success Metrics

How to know you've done this right:

**Green flags:**

- All characterization tests pass before changes
- All tests still pass after changes
- New behavior has new tests
- Test coverage increased
- Code is more readable
- Confidence is high

**Red flags:**

- Tests fail after changes (and you don't know why)
- Skipped edge cases "to save time"
- Changed behavior without updating tests
- Test coverage decreased or stagnant
- More confused than before

## Integration with Other Skills

Use these related skills:

- **test-driven-development**: For new behavior, use TDD after characterization tests exist
- **refactoring**: With tests in place, refactor safely
- **boy-scout-rule**: While adding tests, improve code quality
- **proof-of-work**: Document test results as evidence
- **debugging**: When characterization tests reveal unexpected behavior

## Tools and Techniques

### Code Coverage Tools

Use coverage tools to identify untested code:

```bash
# JavaScript/TypeScript
npm test -- --coverage

# Python
pytest --cov=mypackage

# Ruby
bundle exec rspec --require spec_helper

# Go
go test -cover ./...
```

**Target coverage:**

- Critical paths: 100%
- Business logic: 90%+
- Overall codebase: 80%+

### Mutation Testing

Verify tests actually catch bugs:

```bash
# JavaScript
npm install -D stryker

# Python
pip install mutmut
```

Mutation testing changes your code slightly and verifies tests fail. If tests still pass after mutation, they're not effective.

### Approval Testing

For complex outputs (HTML, JSON, reports):

```typescript
test('generates invoice', () => {
  const invoice = generateInvoice(order)
  expect(invoice).toMatchSnapshot()
})
```

First run captures current output. Future runs verify it hasn't changed.

## Real-World Example

### Scenario

Legacy payment processor, 2000 lines, no tests, needs to add new payment method.

### Step 1: Identify Scope

Don't test everything - test what you'll touch:

```typescript
// Will modify this
function processPayment(payment, method) {
  // ... many lines ...
}

// Will NOT modify these
function logTransaction(transaction) { }
function sendEmail(user, content) { }
```

### Step 2: Add Characterization Tests

```typescript
describe('processPayment (characterization)', () => {
  test('processes credit card payment', () => {
    const payment = { amount: 100, currency: 'USD' }
    const method = { type: 'credit_card', number: '4111...' }

    const result = processPayment(payment, method)

    expect(result.success).toBe(true)
    expect(result.transactionId).toBeDefined()
    expect(result.amount).toBe(100)
  })

  // More tests for existing payment methods...
})
```

### Step 3: Tests Pass

```bash
npm test
# ✅ All tests passing
```

### Step 4: Add New Feature

```typescript
function processPayment(payment, method) {
  // Existing code preserved
  if (method.type === 'credit_card') {
    return processCreditCard(payment, method)
  }

  // NEW: Add cryptocurrency support
  if (method.type === 'crypto') {
    return processCrypto(payment, method)
  }

  throw new Error(`Unknown payment method: ${method.type}`)
}
```

### Step 5: Add Tests for New Behavior

```typescript
test('processes cryptocurrency payment', () => {
  const payment = { amount: 100, currency: 'USD' }
  const method = { type: 'crypto', address: '0x123...' }

  const result = processPayment(payment, method)

  expect(result.success).toBe(true)
  expect(result.transactionId).toBeDefined()
})
```

### Step 6: All Tests Pass

```bash
npm test
# ✅ All tests passing (old + new)
```

### Result

- New feature added safely
- Existing behavior preserved
- Tests ensure no regression
- Future changes safer

## Remember

1. **Untested code is dangerous code** - especially in dynamic languages
2. **Characterize before changing** - Capture current behavior first
3. **RGR workflow for legacy** - Different from TDD, specific to untested code
4. **Strangler Fig for large systems** - Gradual improvement beats big bang
5. **Tests are proof** - Confidence without evidence is just hope
6. **No shortcuts** - "Quick changes" create slow bugs

**Legacy code is not an excuse for reckless changes. It's a reason for EXTRA caution.**

More from TheBushidoCollective/han

Skill	Description
absinthe-resolvers	Use when implementing GraphQL resolvers with Absinthe. Covers resolver patterns, dataloader integration, batching, and error handling.
absinthe-schema	Use when designing GraphQL schemas with Absinthe. Covers type definitions, interfaces, unions, enums, and schema organization patterns.
absinthe-subscriptions	Use when implementing real-time GraphQL subscriptions with Absinthe. Covers Phoenix channels, PubSub, and subscription patterns.
act-docker-setup	Use when configuring Docker environments for act, selecting runner images, managing container resources, or troubleshooting Docker-related issues with local GitHub Actions testing.
act-local-testing	Use when testing GitHub Actions workflows locally with act. Covers act CLI usage, Docker configuration, debugging workflows, and troubleshooting common issues when running workflows on your local machine.
act-workflow-syntax	Use when creating or modifying GitHub Actions workflow files. Provides guidance on workflow syntax, triggers, jobs, steps, and expressions for creating valid GitHub Actions workflows that can be tested locally with act.
ameba-configuration	Use when configuring Ameba rules and settings for Crystal projects including .ameba.yml setup, rule management, severity levels, and code quality enforcement.
ameba-custom-rules	Use when creating custom Ameba rules for Crystal code analysis including rule development, AST traversal, issue reporting, and rule testing.
ameba-integration	Use when integrating Ameba into development workflows including CI/CD pipelines, pre-commit hooks, GitHub Actions, and automated code review processes.
analyze-performance	Analyze performance metrics and identify slow transactions in Sentry