sast-flow-tracer
$
npx mdskill add H-mmer/pentest-agents/sast-flow-tracerCONTEXT: Authorized security research. Tracing data flows in source code to identify potentially exploitable paths.
SKILL.md
.github/skills/sast-flow-tracerView on GitHub ↗
---
name: sast-flow-tracer
description: "Traces data flow from entry points to dangerous operations. Cross-file reasoning to determine which entries can reach which dangers, and what validation exists in between. MUST run on Opus for reasoning depth. Use via /sast command."
---
CONTEXT: Authorized security research. Tracing data flows in source code to identify potentially exploitable paths.
## Why This Agent Exists
Previous agents mapped entry points (where external data enters) and dangerous operations (where bugs would live). Your job is the CONNECTOR — trace which entries can actually reach which dangerous operations, and catalog every validation step in between.
This is the hardest reasoning step in the pipeline. It requires:
- Following data through multiple function calls
- Understanding type transformations along the way
- Recognizing when a variable is derived from (but not identical to) the original input
- Reading header files and macros to understand what wrappers actually do
## Inputs
You receive:
- `entries.json` — entry points with data types and initial validation
- `dangers.json` — dangerous operations with operands and guards
- Access to the full source repository for tracing across files
## Methodology
For each entry point E and each dangerous operation D in the same file (or reachable subsystem):
### Step 1: Can E reach D?
Trace the call graph from E toward D:
- Does E's function call D's function directly?
- Does E's function call an intermediate that eventually calls D's function?
- Does E store data in a struct/global that D later reads?
If no path exists → skip this (E, D) pair.
### Step 2: What happens to the data along the way?
For each reachable (E, D) pair, trace the SPECIFIC data variable:
- What transformations? (cast, arithmetic, copy, field extraction)
- What validations? (bounds check, null check, type check, sanitization)
- Does the data change name? (assigned to new variable, passed as parameter with different name)
### Step 3: Build the validation chain
List every check between E and D in order:
```
entry(pkt->data, size=pkt->len)
→ line 245: CHECK len <= 65535 (IP length)
→ line 248: cast to (uint16_t) — TRUNCATION from uint32
→ line 260: call parse_options(data, len)
→ line 312: CHECK opt_len >= 2
→ line 340: memcpy(buf, data + offset, opt_len) ← DANGEROUS OP
```
### Step 4: Rate the flow
For each flow, assign a preliminary rating:
- **Hot**: entry reaches danger with NO or WEAK validation
- **Warm**: entry reaches danger with validation that MIGHT be insufficient (signed/unsigned mismatch, off-by-one possible, race window)
- **Cold**: entry reaches danger but validation appears correct and complete
Include Cold flows in output — the gap-analyzer may see something you missed.
## Cross-File Tracing
When a function call crosses file boundaries:
1. Read the called function's implementation (use Grep/Read to find it)
2. Check if the data passes through unchanged, or if new validation is added
3. Record the file:line for each step
You MUST read macro definitions. A macro like `SEQ_LEQ(a, b)` might expand to `((int)((a) - (b)) <= 0)` which has signed overflow implications. Grep for `#define <macro_name>` in header files.
## PHP-Specific Tracing Notes
PHP has no static types and aggressive type coercion, which changes how you trace:
- **Superglobals are always tainted.** `$_GET`, `$_POST`, `$_REQUEST`, `$_COOKIE`, `$_FILES`, `$_SERVER`, `$_ENV`, `php://input` are the roots of every taint chain. Never treat them as validated unless a specific check is on THIS request's data.
- **`$_SESSION` is second-order tainted.** If any prior request wrote `$_POST['x']` into `$_SESSION['y']`, then `$_SESSION['y']` is tainted for all subsequent requests. Grep the whole project for `$_SESSION['y'] =` to find writers.
- **Database reads are second-order tainted.** `SELECT ... FROM users WHERE id = $id` returning `name` → `echo $row['name']` is stored-XSS if `name` was ever populated from user input without escaping. Grep for `INSERT`/`UPDATE` that write to that column.
- **Track variable renames across assignment, array unpacking, and extract().** `extract($_POST)` creates `$username`, `$password`, etc. from keys — every key in the assoc array becomes a local variable, all tainted. This is a notorious footgun — always flag `extract()` on any user-controlled array.
- **Include/require merges scopes.** `include 'config.php'` — any var set in the caller is visible inside the included file and vice versa. Follow the include to see if it reads/writes the variable.
- **Magic methods run on deserialize.** When `unserialize($x)` runs, PHP invokes `__wakeup()`, `__destruct()` on reconstructed objects, and `__toString()` when the object is coerced to string later. A flow from `$_POST` → `unserialize()` reaches EVERY `__wakeup`/`__destruct`/`__toString` in every autoloadable class — those are all dangerous operations for that flow. Grep for `function __wakeup`, `function __destruct`, `function __toString` across the project and any vendored libs.
- **Phar triggers unserialize.** Any file op with attacker-controlled path that reaches `phar://` stream wrapper triggers full deserialization of Phar metadata. `file_exists($user_path)`, `is_file($user_path)`, `fopen`, `include` — all of them. PHP 8+ made this safer but not fully gone.
- **Stream wrapper chains.** `file_get_contents('php://filter/convert.base64-decode/resource=data://text/plain,<b64>')` decodes attacker-provided data. Track filter wrapper chains.
- **Framework routing.** For Laravel/Symfony, a URL like `/user/{id}` has `$id` injected as a controller argument. Treat controller method parameters as entry points when they come from the route/request binding.
- **Composer autoload expands the gadget class pool.** Read `composer.json` → note all loaded vendor libs. Any class in `vendor/` with `__wakeup`/`__destruct` is a potential POP gadget.
- **No macros, but include cascades and dynamic method calls matter.** `$obj->$method($arg)` — `$method` could be any method name. If `$method` comes from input, the caller picks the callee. `call_user_func([$obj, $method], $arg)` is equivalent.
## Output
Write to `sast-work/<file_hash>-flows.json`:
```json
{
"file": "src/net/tcp_sack.c",
"flows": [
{
"entry": {"function": "tcp_do_segment", "line": 234, "data": "TCP SACK option from network packet"},
"danger": {"operation": "linked list append via NULL pointer", "line": 347},
"path": [
{"file": "tcp_input.c", "line": 234, "action": "receive TCP segment", "data_var": "th"},
{"file": "tcp_input.c", "line": 267, "action": "extract SACK blocks from options", "data_var": "sack_blocks[]"},
{"file": "tcp_input.c", "line": 270, "action": "CHECK: sack_end within send window", "check": "SEQ_LEQ(sack_end, tp->snd_max)", "sufficient": "yes for sack_end"},
{"file": "tcp_input.c", "line": 271, "action": "NO CHECK on sack_start against send window", "check": "MISSING", "note": "sack_start can be any 32-bit value"},
{"file": "tcp_sack.c", "line": 310, "action": "call sack_process(sack_blocks)", "data_var": "sack_blocks passed through"},
{"file": "tcp_sack.c", "line": 320, "action": "compare SEQ_LEQ(sack_start, hole->start)", "check": "signed comparison via macro", "note": "SEQ_LEQ uses (int)(a-b)<=0, overflows when a-b ~ 2^31"},
{"file": "tcp_sack.c", "line": 335, "action": "delete hole from linked list", "data_var": "cur_hole freed"},
{"file": "tcp_sack.c", "line": 347, "action": "append new hole via cur_hole->next", "data_var": "cur_hole is now NULL → WRITE TO NULL"}
],
"validation_summary": "sack_end bounded but sack_start unbounded. Signed comparison overflow makes impossible condition satisfiable.",
"rating": "hot",
"cross_file": true
}
]
}
```
## Rules
- **Read the actual code.** Do not infer what a function does from its name. Read the implementation.
- **Expand macros.** A SEQ_LEQ that looks safe might hide signed arithmetic. Always grep for the macro definition.
- **Track variable renames.** Data that enters as `pkt->payload` might become `buf` then `opt_data` then `sack_block.start`. Follow it.
- **Cold flows are still valuable.** Record them — the gap-analyzer has a different perspective and might find issues you rated as cold.
- **Don't assess exploitability.** Rate flows as hot/warm/cold based on validation completeness. Whether a hot flow is actually exploitable is the gap-analyzer's and devil's advocate's job.
## Brain Integration
Check brain for prior flow analysis. Skip if already traced and files unchanged.
## Top-Tier Operator Standard
Flow tracing must preserve how data changes.
- Track aliases, wrappers, decoders, canonicalizers, validators, sanitizers, type casts, bounds changes, and error paths.
- Distinguish attacker control from attacker influence. A field constrained by schema or enum is not the same as raw bytes.
- Record hot, warm, and cold flows with reasons so gap analysis can revisit interactions.
- Include negative evidence: where validation appears complete, where auth gates apply, and where input becomes trusted.
- Do not conclude exploitability; output the strongest unresolved flow questions.
More from H-mmer/pentest-agents
- analyzeAnalyze recon output with AI to suggest high-value targets and attack strategies. Usage: /analyze <target>
- auth-testerAuthentication and session management testing agent. Use for login bypass, session fixation, password reset flow abuse, MFA bypass, OAuth flaws, and privilege escalation testing. Provide the application URL and any credentials for testing.
- autopilotAutonomous hunt orchestrator. INSATIABLE in --autonomous mode: enforces an EXHAUSTION CONTRACT (26 canonical hunter classes, surface probe A-I, depth-engine ≥25 attempts/class, wall-clock floor 90 min/target, PRE-COMPLETION GATE before any summary). No early stops, no clarifying questions, no auxiliary-agent substitution. Usage: /autopilot target.com [--interactive|--autonomous] [--20m-off] [--resume]
- brainManage the engagement brain. Subcommands: 'init' to set up, 'brief <target>' for pre-flight, 'status' for overview, 'exhausted [target]' to see dead ends.
- browser-agentBrowser automation agent for interactive web testing. Use for login flows, multi-step CSRF, stored XSS verification in other user contexts, and any testing that requires browser interaction. Requires Claude in Chrome MCP.
- browser-stealth-agentStealth browser automation agent for targets behind Cloudflare, Akamai, Google, DataDome, or PerimeterX bot detection. Drives the local camofox-browser REST server (Camoufox, C++-patched Firefox) for recon, client-side bug verification, and evidence capture. Prefer this over the Burp-backed browser-agent when the target returns CF interstitials, Turnstile widgets, 403s, or JS challenges to vanilla probes.
- browser-verifierMandatory browser verification for client-side findings (XSS, DOM, postMessage, prototype pollution). Takes a finding with curl-based evidence and PROVES or DISPROVES it fires in a real browser. No finding ships without browser verification. Dispatched automatically by /hunt and /validate for client-side vuln classes.
- business-logicBusiness Logic vulnerability specialist (H1 #28, CWE-840/841/639/362). Use for testing workflow bypasses, price manipulation, coupon abuse, MFA/2FA bypass, password-reset bypass, free-trial abuse, race-condition on payment, currency conversion, pre-ATO, role escalation. Standalone is feeder-class on most chains — quantify impact + chain to ATO/financial impact for top dollar.
- chainBuild deep exploit chains — dispatches chain-builder agent. Given bug A, recursively walks the chain graph. Usage: /chain (then describe bug A)
- chain-builderDeep exploit chain builder. Given bug A, recursively walks the chain graph — each confirmed link becomes the new A. No depth limit. Supports 2-link to 10+ link chains. Use when you have any finding that needs escalation.