sast-entry-mapper

$npx mdskill add H-mmer/pentest-agents/sast-entry-mapper

CONTEXT: Authorized security research. Analyzing source code for vulnerability entry points.

SKILL.md
.github/skills/sast-entry-mapperView on GitHub ↗
---
name: sast-entry-mapper
description: "Maps entry points where untrusted data enters a source file. Lists every function that receives external input with data type, size constraints, and initial validation. Use via /sast command."
---
CONTEXT: Authorized security research. Analyzing source code for vulnerability entry points.

## Task

Read the assigned source file and identify every point where data from outside the process enters. This is a **reading comprehension task** — list what you see, do not speculate about vulnerabilities.

## What Counts as an Entry Point

### Direct external input
- Network: `recv()`, `read()` on sockets, HTTP handler params, RPC arguments
- File: `fread()`, `read()` on file descriptors, config parsing, media decoding
- IPC: shared memory reads, pipe reads, D-Bus message handlers, signal handlers
- User: `scanf()`, `getenv()`, command-line args (`argv`), GUI input callbacks
- Hardware: `ioctl()` handlers, device register reads, DMA buffer access

### Indirect external input
- Function parameters where callers pass external data (trace one level up)
- Struct fields populated by callers with external data
- Global/static buffers filled elsewhere from external sources
- Callback functions invoked with external data

### Language-specific
- **Rust**: `unsafe` block reading from raw pointer, FFI function args, `std::io::Read`
- **Java**: `HttpServletRequest` params, `ObjectInputStream.readObject()`, JNDI lookups
- **Python**: `request.args`, `sys.stdin`, `pickle.loads()` input, `eval()` input
- **Go**: `http.Request` fields, `io.Reader` implementations, CGo pointer args
- **PHP** — ALL of these are tainted by default:
  - **Superglobals**: `$_GET`, `$_POST`, `$_REQUEST`, `$_COOKIE`, `$_FILES`, `$_SERVER` (note: `$_SERVER['HTTP_*']` and `$_SERVER['PHP_SELF']`/`QUERY_STRING`/`REQUEST_URI` are attacker-controlled; `$_SERVER['REMOTE_ADDR']` is semi-trusted)
  - **Raw body**: `file_get_contents('php://input')`, `fopen('php://input')`, `$HTTP_RAW_POST_DATA`
  - **Session**: `$_SESSION[...]` — tainted if any prior request wrote user input into it (second-order)
  - **Env**: `$_ENV[...]`, `getenv(...)` — tainted in web contexts if set from request headers
  - **Framework request objects** (treat as direct input):
    - Laravel: `$request->input()`, `$request->get()`, `$request->all()`, `$request->only()`, `$request->query()`, `$request->json()`, `$request->header()`, `$request->cookie()`, `$request->file()`, `request()->...`, route parameters (`$id` in `Route::get('/u/{id}', fn($id)=>...)`)
    - Symfony: `$request->get()`, `$request->query->get()`, `$request->request->get()`, `$request->headers->get()`, `$request->files->get()`, `ParamConverter` bound entities
    - CodeIgniter: `$this->input->get()`, `$this->input->post()`, `$this->input->cookie()`, `$this->input->server()`
    - WordPress: `$_REQUEST` in `admin-ajax.php` handlers, `$_POST` in `wp_ajax_*` callbacks, shortcode/hook callback args
    - Drupal: `\Drupal::request()->query->get()`, `Drupal::request()->request->get()`, form state values
    - Yii: `Yii::$app->request->get()`, `$this->request->post()`
  - **Database-sourced taint (second-order)**: Any `SELECT` that returns data originally written from a superglobal (stored XSS/SQLi). Mark as `indirect` and name the write-path function if visible.
  - **File upload metadata**: `$_FILES['x']['name']` (attacker-chosen filename — never trust extension from this), `$_FILES['x']['type']` (client-provided MIME — forgeable), `$_FILES['x']['tmp_name']` (trusted), `$_FILES['x']['size']` (trusted)
  - **Deserialization surface**: Any call to `unserialize(...)` or any endpoint accepting a `phar://` URL in any file op is an entry point for POP-chain attacks — track classes with `__wakeup`/`__destruct`/`__toString`/`__call`.

  PHP note on "trusted" values: **nothing from the client is trusted**. `$_SERVER['HTTP_HOST']` is attacker-controlled unless a reverse proxy strips it. `$_SERVER['SERVER_NAME']` depends on `UseCanonicalName` in Apache. When in doubt, mark as tainted.

## For Each Entry Point, Record

1. **Function name and line number**
2. **Data source**: network / file / IPC / user / hardware / indirect
3. **Data type**: raw bytes / string / struct / integer / mixed
4. **Size**: fixed / bounded (by what?) / unbounded / unknown
5. **Immediate validation**: what checks happen in the SAME function before the data is used further? Be specific: "length checked against MAX_BUF (128) at line 52" not just "length checked"
6. **Where it goes**: what functions/operations receive this data next?

## Output

Write to `sast-work/<file_hash>-entries.json`:
```json
{
  "file": "src/net/tcp_input.c",
  "entry_points": [
    {
      "function": "tcp_do_segment",
      "line": 234,
      "source": "network",
      "data_type": "raw bytes (TCP segment)",
      "size": "bounded by IP packet size (65535) but length field is attacker-controlled",
      "validation": [
        "line 240: th->th_off checked >= 5 (minimum TCP header)",
        "line 245: total length vs IP length consistency check"
      ],
      "flows_to": ["tcp_sack_option_process(line 312)", "tcp_reass(line 458)"]
    }
  ]
}
```

## Rules

- **Only record what you can see in the code.** Do not speculate about what callers might pass.
- For indirect entry points (function params), note "indirect — caller passes external data" and name the caller if visible.
- If you're unsure whether something is an entry point, include it with a note. False positives are cheap; missed entries are expensive.
- Do NOT analyze whether anything is vulnerable. That's not your job. Just map the entries.

## Brain Integration
Check brain for prior analysis of this file. Skip if already mapped and file unchanged.

## Top-Tier Operator Standard

Entry mapping should maximize recall without inventing attacker control.

- Classify entries by trust source: HTTP, CLI, file, archive, parser callback, network packet, env var, database row, queue message, webhook, plugin, template, test harness, or public API.
- Record the path from external input to function parameter when visible.
- Mark privilege and preconditions: auth required, admin only, local-only, feature flag, config option, or test-only.
- Include indirect entries with caller evidence; exclude pure internal helpers unless a caller imports external data.
- Do not assess vulnerability. Your output is the source inventory for flow tracing.
More from H-mmer/pentest-agents