judge-pentest

$npx mdskill add wgpsec/AboutSecurity/judge-pentest

Verify pentest coverage gaps and confirm task completion.

  • Identifies untested attack surfaces across web applications.
  • Analyzes injection, cross-site, and authentication vulnerabilities.
  • Determines if all required security checks have been executed.
  • Outputs specific feedback on missing coverage areas.

SKILL.md

.github/skills/judge-pentestView on GitHub ↗
---
name: judge-pentest
description: "Penetration testing evaluation checklist for the decision Agent. Evaluates whether a pentest has sufficiently covered all attack surfaces, determines task completion, and provides specific feedback on uncovered areas."
metadata:
  tags: "judge,evaluation,pentest,coverage,decision"
  category: "general"
---

# Penetration Testing Evaluation Checklist

## Web Application Vulnerability Coverage Check

Check each attack surface below for whether it has been tested; mark untested ones as gaps:

### Injection
- [ ] SQL Injection (login forms, search, API params, cookies)
- [ ] XPath Injection
- [ ] LDAP Injection
- [ ] Command Injection (OS Command Injection)
- [ ] SSTI (Server-Side Template Injection)
- [ ] XXE (XML External Entity Injection)

### Cross-Site
- [ ] Reflected XSS (search box, URL params, error pages)
- [ ] Stored XSS (comments, feedback, user profiles)
- [ ] DOM XSS
- [ ] CSRF (transfers, password changes, critical operations)

### Authentication & Authorization
- [ ] Default/weak credentials
- [ ] SQL injection auth bypass
- [ ] Brute force protection (account lockout mechanism)
- [ ] Username enumeration (error message differences)
- [ ] Session management (Session Fixation, Cookie security attributes)
- [ ] JWT/Token security (signature verification, algorithm confusion, plaintext encoding)
- [ ] Vertical privilege escalation (regular user → admin functions)
- [ ] Horizontal privilege escalation / IDOR (accessing other users' resources)

### Business Logic
- [ ] IDOR — account info viewing
- [ ] IDOR — transfer/transaction operations
- [ ] IDOR — password change
- [ ] Negative/zero amount transactions
- [ ] Concurrency/race conditions
- [ ] Business flow bypass

### Information Disclosure
- [ ] Error page info leaks (stack traces, paths)
- [ ] API documentation exposure (Swagger, WSDL)
- [ ] Backup file disclosure
- [ ] Sensitive config exposure
- [ ] HTTP response headers (Server version, X-Powered-By)

### Server-Side
- [ ] SSRF (Server-Side Request Forgery)
- [ ] File upload vulnerabilities
- [ ] Path traversal / LFI / RFI
- [ ] Deserialization vulnerabilities

### Configuration
- [ ] Directory listing

### API-Specific
- [ ] REST API auth bypass
- [ ] API IDOR
- [ ] API parameter tampering
- [ ] API rate limiting

### Known CVE/CNVD
- [ ] Known CVEs for target product/tech stack
- [ ] Known CNVDs for target product/tech stack

## Evaluation Decision Logic

**The goal of penetration testing is to discover as many vulnerabilities as possible; do NOT end prematurely.**

```
# In pentest scenarios, complete is advisory only and does NOT trigger early exit
# The judge's core value is providing precise "what to test next round" feedback

if tested_categories >= 90% of total && two consecutive rounds with no new vulns:
    complete = true, confidence >= 0.8
else:
    complete = false
    feedback = explicitly list untested attack surfaces with specific testing suggestions
    missing_areas = names of untested categories
```

**Important: Better to run one extra round than to miss one direction. Even if many vulnerabilities have been found, if there are still untested attack surfaces, return complete=false.**

**If a target product/tech stack was identified in this round but `search_vulndb` was never called, MUST return complete=false and require using `search_vulndb(query="product name")` to query known vulnerabilities.**

## Feedback Template

When complete=false, feedback should include:
1. **Completed work** (acknowledge positively, avoid repetition)
2. **Specific missing directions** (do NOT say "keep testing" vaguely — specify concrete endpoints + vulnerability types)
3. **Suggested test steps** (e.g., "use sqlmap for deep injection testing on the query parameter of /api/search")

More from wgpsec/AboutSecurity

SkillDescription
401-403-bypass401/403 访问拒绝绕过方法论。当遇到管理后台、API 端点返回 401/403 Forbidden 时使用。覆盖路径操纵、HTTP 方法篡改、Header 注入、协议降级、组合攻击
ad-acl-abuseActive Directory ACL 滥用攻击方法论。当 BloodHound 发现 GenericAll/WriteDACL/WriteOwner/GenericWrite/ForceChangePassword 等危险 ACE 时使用。覆盖 ACE 枚举、权限滥用链、Shadow Credentials、RBCD 攻击
ad-delegation-attackKerberos 委派攻击(非约束/约束/RBCD)。当 BloodHound 发现委派配置、或已获取有 SPN 的服务账号/机器账号控制权时使用。通过 S4U 协议滥用可实现跨服务模拟任意用户,常用于域内权限提升和横向移动。
ad-domain-attackActive Directory 域环境攻击全链路。当目标主机在域环境中(systeminfo 显示 Domain 非 WORKGROUP)、发现 88/389/636 端口、或获取到域用户凭据时使用。覆盖域信息收集、用户枚举、Kerberoasting、AS-REP Roasting、委派攻击、ACL 滥用、DCSync、Golden/Silver Ticket
ad-persistenceAD 域环境持久化技术。当已获取域管/本地管理员权限、需要建立持久访问以确保重启或密码更改后仍能回到目标环境时使用。覆盖主机级持久化(计划任务/注册表Run/COM劫持/WMI事件订阅/Windows服务/启动文件夹)、域级持久化(Golden Ticket/Silver Ticket/Skeleton Key/DSRM/AdminSDHolder)、DCShadow/GoldenGMSA高级技术、清理命令与检测规避
ad-trust-attack域信任关系攻击。当目标存在多域/多林环境时使用。包含父子域提权(Golden Ticket + ExtraSid)、跨林攻击(SID History/MSSQL Trust Links)、单向信任利用。已获取子域 Domain Admin 或发现信任关系时优先加载。
adcs-certipy-attackActive Directory Certificate Services (ADCS) 证书攻击。当发现域内有 CA 服务器、ADCS Web Enrollment、证书模板配置错误时使用。覆盖 ESC1-ESC11 所有证书滥用路径、Certipy 工具链、证书伪造、NTLM 中继到 ADCS。发现 ADCS/CA/证书/certsrv 相关内容时一定要使用此技能
adinfo-enum使用 Adinfo 进行 Active Directory 信息收集。当获得域用户凭据后需要快速收集域环境信息时使用。Adinfo 是一个快速 AD 信息收集工具,一条命令输出域控列表、域管用户、信任关系、GPO、SPN、委派配置等关键信息——比手动 LDAP 查询快得多。发现域环境后第一步信息收集使用此技能
agent-security|
ai-data-security|