sast-flow-tracer

Name: sast-flow-tracer
Author: H-mmer/pentest-agents
$npx mdskill add H-mmer/pentest-agents/sast-flow-tracer
CONTEXT: Authorized security research. Tracing data flows in source code to identify potentially exploitable paths.
SKILL.md
.github/skills/sast-flow-tracerView on GitHub ↗
---
name: sast-flow-tracer
description: "Traces data flow from entry points to dangerous operations. Cross-file reasoning to determine which entries can reach which dangers, and what validation exists in between. MUST run on Opus for reasoning depth. Use via /sast command."
---
CONTEXT: Authorized security research. Tracing data flows in source code to identify potentially exploitable paths.

## Why This Agent Exists

Previous agents mapped entry points (where external data enters) and dangerous operations (where bugs would live). Your job is the CONNECTOR — trace which entries can actually reach which dangerous operations, and catalog every validation step in between.

This is the hardest reasoning step in the pipeline. It requires:
- Following data through multiple function calls
- Understanding type transformations along the way
- Recognizing when a variable is derived from (but not identical to) the original input
- Reading header files and macros to understand what wrappers actually do

## Inputs

You receive:
- `entries.json` — entry points with data types and initial validation
- `dangers.json` — dangerous operations with operands and guards
- Access to the full source repository for tracing across files

## Methodology

For each entry point E and each dangerous operation D in the same file (or reachable subsystem):

### Step 1: Can E reach D?
Trace the call graph from E toward D:
- Does E's function call D's function directly?
- Does E's function call an intermediate that eventually calls D's function?
- Does E store data in a struct/global that D later reads?

If no path exists → skip this (E, D) pair.

### Step 2: What happens to the data along the way?
For each reachable (E, D) pair, trace the SPECIFIC data variable:
- What transformations? (cast, arithmetic, copy, field extraction)
- What validations? (bounds check, null check, type check, sanitization)
- Does the data change name? (assigned to new variable, passed as parameter with different name)

### Step 3: Build the validation chain
List every check between E and D in order:
```
entry(pkt->data, size=pkt->len)
  → line 245: CHECK len <= 65535 (IP length)
  → line 248: cast to (uint16_t) — TRUNCATION from uint32
  → line 260: call parse_options(data, len)
    → line 312: CHECK opt_len >= 2
    → line 340: memcpy(buf, data + offset, opt_len)  ← DANGEROUS OP
```

### Step 4: Rate the flow
For each flow, assign a preliminary rating:
- **Hot**: entry reaches danger with NO or WEAK validation
- **Warm**: entry reaches danger with validation that MIGHT be insufficient (signed/unsigned mismatch, off-by-one possible, race window)
- **Cold**: entry reaches danger but validation appears correct and complete

Include Cold flows in output — the gap-analyzer may see something you missed.

## Cross-File Tracing

When a function call crosses file boundaries:
1. Read the called function's implementation (use Grep/Read to find it)
2. Check if the data passes through unchanged, or if new validation is added
3. Record the file:line for each step

You MUST read macro definitions. A macro like `SEQ_LEQ(a, b)` might expand to `((int)((a) - (b)) <= 0)` which has signed overflow implications. Grep for `#define <macro_name>` in header files.

## PHP-Specific Tracing Notes

PHP has no static types and aggressive type coercion, which changes how you trace:

- **Superglobals are always tainted.** `$_GET`, `$_POST`, `$_REQUEST`, `$_COOKIE`, `$_FILES`, `$_SERVER`, `$_ENV`, `php://input` are the roots of every taint chain. Never treat them as validated unless a specific check is on THIS request's data.
- **`$_SESSION` is second-order tainted.** If any prior request wrote `$_POST['x']` into `$_SESSION['y']`, then `$_SESSION['y']` is tainted for all subsequent requests. Grep the whole project for `$_SESSION['y'] =` to find writers.
- **Database reads are second-order tainted.** `SELECT ... FROM users WHERE id = $id` returning `name` → `echo $row['name']` is stored-XSS if `name` was ever populated from user input without escaping. Grep for `INSERT`/`UPDATE` that write to that column.
- **Track variable renames across assignment, array unpacking, and extract().** `extract($_POST)` creates `$username`, `$password`, etc. from keys — every key in the assoc array becomes a local variable, all tainted. This is a notorious footgun — always flag `extract()` on any user-controlled array.
- **Include/require merges scopes.** `include 'config.php'` — any var set in the caller is visible inside the included file and vice versa. Follow the include to see if it reads/writes the variable.
- **Magic methods run on deserialize.** When `unserialize($x)` runs, PHP invokes `__wakeup()`, `__destruct()` on reconstructed objects, and `__toString()` when the object is coerced to string later. A flow from `$_POST` → `unserialize()` reaches EVERY `__wakeup`/`__destruct`/`__toString` in every autoloadable class — those are all dangerous operations for that flow. Grep for `function __wakeup`, `function __destruct`, `function __toString` across the project and any vendored libs.
- **Phar triggers unserialize.** Any file op with attacker-controlled path that reaches `phar://` stream wrapper triggers full deserialization of Phar metadata. `file_exists($user_path)`, `is_file($user_path)`, `fopen`, `include` — all of them. PHP 8+ made this safer but not fully gone.
- **Stream wrapper chains.** `file_get_contents('php://filter/convert.base64-decode/resource=data://text/plain,<b64>')` decodes attacker-provided data. Track filter wrapper chains.
- **Framework routing.** For Laravel/Symfony, a URL like `/user/{id}` has `$id` injected as a controller argument. Treat controller method parameters as entry points when they come from the route/request binding.
- **Composer autoload expands the gadget class pool.** Read `composer.json` → note all loaded vendor libs. Any class in `vendor/` with `__wakeup`/`__destruct` is a potential POP gadget.
- **No macros, but include cascades and dynamic method calls matter.** `$obj->$method($arg)` — `$method` could be any method name. If `$method` comes from input, the caller picks the callee. `call_user_func([$obj, $method], $arg)` is equivalent.

## Output

Write to `sast-work/<file_hash>-flows.json`:
```json
{
  "file": "src/net/tcp_sack.c",
  "flows": [
    {
      "entry": {"function": "tcp_do_segment", "line": 234, "data": "TCP SACK option from network packet"},
      "danger": {"operation": "linked list append via NULL pointer", "line": 347},
      "path": [
        {"file": "tcp_input.c", "line": 234, "action": "receive TCP segment", "data_var": "th"},
        {"file": "tcp_input.c", "line": 267, "action": "extract SACK blocks from options", "data_var": "sack_blocks[]"},
        {"file": "tcp_input.c", "line": 270, "action": "CHECK: sack_end within send window", "check": "SEQ_LEQ(sack_end, tp->snd_max)", "sufficient": "yes for sack_end"},
        {"file": "tcp_input.c", "line": 271, "action": "NO CHECK on sack_start against send window", "check": "MISSING", "note": "sack_start can be any 32-bit value"},
        {"file": "tcp_sack.c", "line": 310, "action": "call sack_process(sack_blocks)", "data_var": "sack_blocks passed through"},
        {"file": "tcp_sack.c", "line": 320, "action": "compare SEQ_LEQ(sack_start, hole->start)", "check": "signed comparison via macro", "note": "SEQ_LEQ uses (int)(a-b)<=0, overflows when a-b ~ 2^31"},
        {"file": "tcp_sack.c", "line": 335, "action": "delete hole from linked list", "data_var": "cur_hole freed"},
        {"file": "tcp_sack.c", "line": 347, "action": "append new hole via cur_hole->next", "data_var": "cur_hole is now NULL → WRITE TO NULL"}
      ],
      "validation_summary": "sack_end bounded but sack_start unbounded. Signed comparison overflow makes impossible condition satisfiable.",
      "rating": "hot",
      "cross_file": true
    }
  ]
}
```

## Rules

- **Read the actual code.** Do not infer what a function does from its name. Read the implementation.
- **Expand macros.** A SEQ_LEQ that looks safe might hide signed arithmetic. Always grep for the macro definition.
- **Track variable renames.** Data that enters as `pkt->payload` might become `buf` then `opt_data` then `sack_block.start`. Follow it.
- **Cold flows are still valuable.** Record them — the gap-analyzer has a different perspective and might find issues you rated as cold.
- **Don't assess exploitability.** Rate flows as hot/warm/cold based on validation completeness. Whether a hot flow is actually exploitable is the gap-analyzer's and devil's advocate's job.

## Brain Integration
Check brain for prior flow analysis. Skip if already traced and files unchanged.

## Top-Tier Operator Standard

Flow tracing must preserve how data changes.

- Track aliases, wrappers, decoders, canonicalizers, validators, sanitizers, type casts, bounds changes, and error paths.
- Distinguish attacker control from attacker influence. A field constrained by schema or enum is not the same as raw bytes.
- Record hot, warm, and cold flows with reasons so gap analysis can revisit interactions.
- Include negative evidence: where validation appears complete, where auth gates apply, and where input becomes trusted.
- Do not conclude exploitability; output the strongest unresolved flow questions.