document-scanning
$
npx mdskill add Community-Access/accessibility-agents/document-scanningScan folders to extract document metadata for accessibility audits.
- Enables inventory building for Office and PDF files.
- Integrates with git diff to detect document changes.
- Executes platform-specific discovery commands on Windows or macOS.
- Delivers structured property data including title and author.
SKILL.md
.github/skills/document-scanningView on GitHub ↗
---
name: document-scanning
description: Document discovery, inventory building, and metadata extraction for accessibility audits. Use when scanning folders for Office documents (.docx, .xlsx, .pptx) and PDFs, building file inventories, detecting changes via git diff, or extracting document properties like title, author, and language.
---
# Document Scanning
## Supported File Types
| Extension | Type | Sub-Agent |
|-----------|------|-----------|
| .docx | Word document | word-accessibility |
| .xlsx | Excel workbook | excel-accessibility |
| .pptx | PowerPoint presentation | powerpoint-accessibility |
| .pdf | PDF document | pdf-accessibility |
## File Discovery Commands
### PowerShell (Windows)
```powershell
# Non-recursive scan
Get-ChildItem -Path "<folder>" -File -Include *.docx,*.xlsx,*.pptx,*.pdf
# Recursive scan
Get-ChildItem -Path "<folder>" -File -Include *.docx,*.xlsx,*.pptx,*.pdf -Recurse |
Where-Object { $_.Name -notlike '~$*' -and $_.Name -notlike '*.tmp' -and $_.Name -notlike '*.bak' } |
Where-Object { $_.FullName -notmatch '[\\/](\.git|node_modules|__pycache__|\.vscode)[\\/]' }
```
### Bash (macOS)
```bash
# Non-recursive scan
find "<folder>" -maxdepth 1 -type f \( -name "*.docx" -o -name "*.xlsx" -o -name "*.pptx" -o -name "*.pdf" \) ! -name "~\$*"
# Recursive scan
find "<folder>" -type f \( -name "*.docx" -o -name "*.xlsx" -o -name "*.pptx" -o -name "*.pdf" \) \
! -name "~\$*" ! -name "*.tmp" ! -name "*.bak" \
! -path "*/.git/*" ! -path "*/node_modules/*" ! -path "*/__pycache__/*" ! -path "*/.vscode/*"
```
## Delta Detection
### Git-based
```bash
# Files changed since last commit
git diff --name-only HEAD~1 HEAD -- '*.docx' '*.xlsx' '*.pptx' '*.pdf'
# Files changed since a specific tag
git diff --name-only <tag> HEAD -- '*.docx' '*.xlsx' '*.pptx' '*.pdf'
# Files changed in the last N days
git log --since="N days ago" --name-only --diff-filter=ACMR --pretty="" -- '*.docx' '*.xlsx' '*.pptx' '*.pdf' | sort -u
```
### Timestamp-based (PowerShell)
```powershell
# Files modified since a specific date
Get-ChildItem -Path "<folder>" -File -Include *.docx,*.xlsx,*.pptx,*.pdf -Recurse |
Where-Object { $_.LastWriteTime -gt [datetime]"2025-01-01" }
```
## Files to Skip
Always exclude these patterns during scanning:
- `~$*` - Office lock/temp files (created when a document is open)
- `*.tmp` - Temporary files
- `*.bak` - Backup files
- Files inside `.git/`, `node_modules/`, `.vscode/`, `__pycache__/` directories
## Scan Configuration Files
| File | Purpose |
|------|---------|
| `.a11y-office-config.json` | Rule enable/disable for Word, Excel, PowerPoint |
| `.a11y-pdf-config.json` | Rule enable/disable for PDF scanning |
### Scan Profiles
| Profile | Rules | Severities | Use Case |
|---------|-------|------------|----------|
| Strict | All | Error, Warning, Tip | Public-facing, legally required documents |
| Moderate | All | Error, Warning | Most organizations |
| Minimal | All | Error only | Triaging large document libraries |
## Context Passing Format
When delegating to a sub-agent, always provide this context block:
```text
## Document Scan Context
- **File:** [full path]
- **Scan Profile:** [strict | moderate | minimal]
- **Severity Filter:** [error, warning, tip]
- **Disabled Rules:** [list or "none"]
- **User Notes:** [any specifics]
- **Part of Batch:** [yes/no - if yes, indicate X of Y]
```
More from Community-Access/accessibility-agents
- Accessibility LeadAccessibility team lead and orchestrator. Use on EVERY task that involves web UI code, HTML, JSX, CSS, React components, web pages, or any user-facing web content. This agent coordinates the accessibility specialist team and ensures no accessibility requirement is missed. Runs the final review before any UI code is considered complete. Applies to any web framework or vanilla HTML/CSS/JS.
- Accessibility Regression DetectorDetects accessibility regressions by comparing audit results across commits/branches. Tracks score trends and validates previous fixes.
- Accessibility Statement GeneratorGenerates conformance/accessibility statements following W3C or EU model templates. Maps audit results to conformance claims, known limitations, and contact information.
- Accessibility Tool BuilderExpert in building accessibility scanning tools, rule engines, document parsers, report generators, and audit automation. WCAG criterion mapping, severity scoring, CLI/GUI scanner architecture, CI/CD integration.
- Accessibility TrackerTrack accessibility improvements across VS Code and any configured repos -- get summaries, deep dives, workspace reports, WCAG cross-references, and proactive alerts on a11y changes.
- accessibility-rulesCross-format document accessibility rule reference with WCAG 2.2 mapping. Use when looking up accessibility rules for Word (DOCX-*), Excel (XLSX-*), PowerPoint (PPTX-*), or PDF (PDFUA.*, PDFBP.*, PDFQ.*) documents, or when mapping findings to WCAG success criteria for compliance reporting.
- Actions ManagerGitHub Actions command center -- view workflow runs, read logs, re-run failed jobs, manage workflows, and debug CI failures entirely from the editor. Bypasses the deeply nested, visually-dependent Actions UI that is largely inaccessible to screen readers.
- Alt Text & HeadingsAlternative text and heading structure specialist for web applications. Use when building or reviewing any page with images, icons, SVGs, videos, figures, charts, or heading hierarchies. Covers meaningful vs decorative images, complex image descriptions, heading levels, document outline, and landmark structure. Can analyze images visually, compare existing alt text against image content, and interactively suggest appropriate alternatives. Applies to any web framework or vanilla HTML/CSS/JS.
- Analytics & InsightsYour GitHub analytics command center -- team velocity, review turnaround, issue resolution metrics, contribution activity, bottleneck detection, and code churn analysis with dual markdown + HTML reports.
- ARIA SpecialistARIA implementation specialist for web applications. Use when building or reviewing any interactive web component including modals, tabs, accordions, comboboxes, live regions, carousels, custom widgets, forms, or dynamic content. Also use when reviewing ARIA usage for correctness. Applies to any web framework or vanilla HTML/CSS/JS.