judge-pentest
$
npx mdskill add wgpsec/AboutSecurity/judge-pentestVerify pentest coverage gaps and confirm task completion.
- Identifies untested attack surfaces across web applications.
- Analyzes injection, cross-site, and authentication vulnerabilities.
- Determines if all required security checks have been executed.
- Outputs specific feedback on missing coverage areas.
SKILL.md
.github/skills/judge-pentestView on GitHub ↗
---
name: judge-pentest
description: "Penetration testing evaluation checklist for the decision Agent. Evaluates whether a pentest has sufficiently covered all attack surfaces, determines task completion, and provides specific feedback on uncovered areas."
metadata:
tags: "judge,evaluation,pentest,coverage,decision"
category: "general"
---
# Penetration Testing Evaluation Checklist
## Web Application Vulnerability Coverage Check
Check each attack surface below for whether it has been tested; mark untested ones as gaps:
### Injection
- [ ] SQL Injection (login forms, search, API params, cookies)
- [ ] XPath Injection
- [ ] LDAP Injection
- [ ] Command Injection (OS Command Injection)
- [ ] SSTI (Server-Side Template Injection)
- [ ] XXE (XML External Entity Injection)
### Cross-Site
- [ ] Reflected XSS (search box, URL params, error pages)
- [ ] Stored XSS (comments, feedback, user profiles)
- [ ] DOM XSS
- [ ] CSRF (transfers, password changes, critical operations)
### Authentication & Authorization
- [ ] Default/weak credentials
- [ ] SQL injection auth bypass
- [ ] Brute force protection (account lockout mechanism)
- [ ] Username enumeration (error message differences)
- [ ] Session management (Session Fixation, Cookie security attributes)
- [ ] JWT/Token security (signature verification, algorithm confusion, plaintext encoding)
- [ ] Vertical privilege escalation (regular user → admin functions)
- [ ] Horizontal privilege escalation / IDOR (accessing other users' resources)
### Business Logic
- [ ] IDOR — account info viewing
- [ ] IDOR — transfer/transaction operations
- [ ] IDOR — password change
- [ ] Negative/zero amount transactions
- [ ] Concurrency/race conditions
- [ ] Business flow bypass
### Information Disclosure
- [ ] Error page info leaks (stack traces, paths)
- [ ] API documentation exposure (Swagger, WSDL)
- [ ] Backup file disclosure
- [ ] Sensitive config exposure
- [ ] HTTP response headers (Server version, X-Powered-By)
### Server-Side
- [ ] SSRF (Server-Side Request Forgery)
- [ ] File upload vulnerabilities
- [ ] Path traversal / LFI / RFI
- [ ] Deserialization vulnerabilities
### Configuration
- [ ] Directory listing
### API-Specific
- [ ] REST API auth bypass
- [ ] API IDOR
- [ ] API parameter tampering
- [ ] API rate limiting
### Known CVE/CNVD
- [ ] Known CVEs for target product/tech stack
- [ ] Known CNVDs for target product/tech stack
## Evaluation Decision Logic
**The goal of penetration testing is to discover as many vulnerabilities as possible; do NOT end prematurely.**
```
# In pentest scenarios, complete is advisory only and does NOT trigger early exit
# The judge's core value is providing precise "what to test next round" feedback
if tested_categories >= 90% of total && two consecutive rounds with no new vulns:
complete = true, confidence >= 0.8
else:
complete = false
feedback = explicitly list untested attack surfaces with specific testing suggestions
missing_areas = names of untested categories
```
**Important: Better to run one extra round than to miss one direction. Even if many vulnerabilities have been found, if there are still untested attack surfaces, return complete=false.**
**If a target product/tech stack was identified in this round but `search_vulndb` was never called, MUST return complete=false and require using `search_vulndb(query="product name")` to query known vulnerabilities.**
## Feedback Template
When complete=false, feedback should include:
1. **Completed work** (acknowledge positively, avoid repetition)
2. **Specific missing directions** (do NOT say "keep testing" vaguely — specify concrete endpoints + vulnerability types)
3. **Suggested test steps** (e.g., "use sqlmap for deep injection testing on the query parameter of /api/search")