oma-pdf

Name: oma-pdf
Author: first-fluke/oh-my-agent

$npx mdskill add first-fluke/oh-my-agent/oma-pdf

Convert PDFs to structured Markdown with preserved layout.

Extracts text, tables, headings, and images for AI consumption.
Depends on opendataloader-pdf for reliable document parsing.
Selects output format based on user request or defaults to markdown.
Delivers clean Markdown files ready for RAG or LLM processing.

SKILL.md

.github/skills/oma-pdfView on GitHub ↗

---
name: oma-pdf
description: >
Convert PDF files to Markdown using opendataloader-pdf. Extracts text, tables, headings,
lists, and images with correct reading order. Use for PDF parsing, PDF to Markdown conversion,
document extraction, and AI-ready data preparation.
---

# PDF Skill - PDF to Markdown Conversion

## Scheduling

### Goal
Convert PDF files into structured Markdown or another requested extraction format while preserving readable document structure for LLM context, RAG, or downstream review.

### Intent signature
- User asks to convert, parse, read, extract, or transform a PDF.
- User needs PDF text, headings, lists, tables, or images prepared for AI consumption.
- User mentions "PDF to markdown", "parse PDF", "read this PDF", or equivalent wording.

### When to use
- Converting PDF documents to Markdown for LLM context or RAG
- Extracting structured content such as tables, headings, lists, images, footnotes, or hyperlinks
- Preparing PDF data for AI consumption
- Checking whether a PDF has a text layer before choosing OCR

### When NOT to use
- Generating or creating PDFs -> use document-generation tools
- Editing existing PDFs -> out of scope
- Reading an already-text file -> use direct file reading
- Processing HWP, HWPX, DOCX, XLSX, or slide decks -> use the matching document skill

### Expected inputs
- `input_path`: PDF file or folder path
- `output_dir`: optional target directory
- `format`: optional output format, default `markdown`
- `ocr_languages`: optional OCR language list for scanned or image-based PDFs
- `extraction_options`: optional flags for tagged structure, image extraction, or hybrid conversion

### Expected outputs
- Markdown, text, JSON, HTML, or combined extraction output
- Normalized Markdown when Markdown is produced
- A short report with output path, page count, and conversion issues

### Dependencies
- `uvx opendataloader-pdf` for standard conversion
- `uvx opendataloader-pdf-hybrid` for OCR or hybrid conversion
- `uvx mdformat` for Markdown normalization
- Local filesystem access to input and output paths
- Optional OCR runtime via the hybrid server

### Control-flow features
- Branches on text-layer quality, tagged PDF availability, scan/OCR needs, and user-requested output format
- Calls external CLI tools through `uvx`
- Reads local files and writes local extraction outputs
- Uses a hybrid server only when OCR or complex extraction needs justify it

## Structural Flow

### Entry
1. Confirm that the input path exists and is a PDF file, PDF folder, or supported batch input.
2. Check file size and warn when the input is large enough to risk slow conversion or memory pressure.
3. Resolve `output_dir` and the expected output filename.

### Scenes
1. **PREPARE**: Validate the input path, output target, and requested extraction options.
2. **ACQUIRE**: Assess whether the PDF has a readable text layer by extracting a text preview.
3. **ACT**: Convert using standard mode, tagged-structure mode, or hybrid OCR mode.
4. **VERIFY**: Run `mdformat` for Markdown output and inspect the result for readable structure.
5. **FINALIZE**: Report output path, page count, format, and any extraction quality issues.

### Transitions
- If the preview text is readable, use standard conversion.
- If the PDF is tagged and standard output is garbled, retry with `--use-struct-tree`.
- If the PDF is scanned or image-based, start or reuse the hybrid OCR server and convert with hybrid mode.
- If conversion fails because the PDF is encrypted, stop and ask for the password or an unlocked copy.
- If conversion hits memory or size limits, process smaller page ranges or batches.

### Failure and recovery
| Failure | Recovery |
|---------|----------|
| `uvx` unavailable | Ask user to install `uv` before conversion |
| Password-protected PDF | Ask for password or unlocked PDF |
| Garbled output | Retry with tagged structure or hybrid mode |
| Missing tables | Retry with hybrid mode for complex or borderless tables |
| OCR language mismatch | Retry with explicit OCR languages, for example `ko,en` |
| Large file or memory pressure | Split into page ranges or batch smaller inputs |

### Exit
- Success: output file exists, Markdown is formatted when applicable, and extracted structure is readable.
- Partial success: output exists but quality issues are reported explicitly.
- Failure: no reliable output is produced and the blocking cause is reported.

## Logical Operations

### Actions
| Action | SSL primitive | Evidence |
|--------|---------------|----------|
| Validate path and options | `VALIDATE` | Input preflight in execution protocol |
| Probe text layer | `READ` | Text preview extraction |
| Choose conversion strategy | `SELECT` | Standard, tagged, or hybrid mode decision |
| Run converter | `CALL_TOOL` | `uvx opendataloader-pdf` |
| Start OCR server | `CALL_TOOL` | `uvx opendataloader-pdf-hybrid` |
| Write output artifact | `WRITE` | Markdown, text, JSON, or HTML output |
| Normalize Markdown | `CALL_TOOL` | `uvx mdformat` |
| Inspect extraction quality | `VALIDATE` | Structure/readability verification |
| Report result | `NOTIFY` | Final user-facing summary |

### Tools and instruments
- `opendataloader-pdf`: primary PDF extraction CLI
- `opendataloader-pdf-hybrid`: hybrid OCR and complex extraction path
- `mdformat`: Markdown normalization
- Filesystem commands such as `file`, `wc`, or `pdfinfo` may be used for preflight when available

### Canonical command path
```bash
uvx opendataloader-pdf "{input_path}" --format markdown --output-dir "{output_dir}"
uvx mdformat "{output_path}"
```

For scanned/image-based PDFs, start OCR first and then convert through hybrid mode:
```bash
uvx opendataloader-pdf-hybrid --port 5002 --force-ocr --ocr-lang "{languages}"
uvx opendataloader-pdf --hybrid docling-fast "{input_path}" --format markdown --output-dir "{output_dir}"
```

### Resource scope
| Scope | Resource target |
|-------|-----------------|
| `LOCAL_FS` | Input PDFs and generated output files |
| `PROCESS` | `uvx` subprocesses and optional hybrid server |
| `MEMORY` | Extracted previews and validation notes |
| `OTHER` | OCR model/runtime behavior inside hybrid mode |

### Preconditions
- The input PDF path exists and is readable.
- The output location is writable or can be created.
- Required CLIs are available through `uvx`.
- OCR is only attempted when hybrid mode is available or can be started.

### Effects and side effects
- Creates or overwrites extraction output depending on configuration and user intent.
- May start a local hybrid OCR server on the configured port.
- May consume significant CPU, memory, or time for large or scanned PDFs.
- Does not intentionally modify the source PDF.

### Guardrails
1. Do not invent missing content when extraction is incomplete.
2. Always report garbled text, missing tables, OCR uncertainty, or partial extraction.
3. Prefer standard conversion first when the text layer is readable.
4. Use OCR only when the PDF is scanned, image-based, or standard extraction quality is insufficient.
5. Keep detailed command sequences in `resources/execution-protocol.md` rather than duplicating every variant here.

## References
- Execution protocol: `resources/execution-protocol.md`
- Configuration: `config/pdf-config.yaml`
- Context loading: `../_shared/core/context-loading.md`
- Quality principles: `../_shared/core/quality-principles.md`

More from first-fluke/oh-my-agent

Skill	Description
oma-architecture	Architecture specialist for software/system design, module and service boundaries, tradeoff analysis, and stakeholder synthesis. Uses context-aware methods such as diagnostic routing, design-twice comparison, ATAM-style risk analysis, CBAM-style prioritization, and ADR-style decision records.
oma-backend	Backend specialist for APIs, databases, authentication with clean architecture (Repository/Service/Router pattern). Use for API, endpoint, REST, database, server, migration, and auth work.
oma-brainstorm	Design-first ideation that explores user intent, constraints, and approaches before any planning or implementation. Use for brainstorming, ideation, exploring concepts, and evaluating approaches.
oma-coordination	Guide for coordinating PM, Frontend, Backend, Mobile, and QA agents on complex projects via CLI. Use for manual step-by-step coordination and workflow guidance.
oma-db	Database specialist for SQL, NoSQL, and vector database modeling, schema design, normalization, indexing, transactions, integrity, concurrency control, backup, capacity planning, data standards, anti-pattern review, and compliance-aware database design. Use for database, schema, ERD, table design, document model, vector index design, RAG retrieval architecture, migration, query tuning, glossary, capacity estimation, backup strategy, database anti-pattern remediation work, and ISO 27001, ISO 27002, or ISO 22301-aware database recommendations.
oma-debug	Bug diagnosis and fixing specialist - analyzes errors, identifies root causes, provides fixes, and writes regression tests. Use for bug, debug, error, crash, traceback, exception, and regression work.
oma-design	>
oma-dev-workflow	Use when setting up or optimizing developer workflows in a monorepo, managing mise tasks, git hooks, CI/CD pipelines, database migrations, or release automation. Invoke for development environment setup, build automation, testing workflows, and release coordination.
oma-frontend	Frontend specialist for React, Next.js, TypeScript with FSD-lite architecture, shadcn/ui, and design system alignment. Use for UI, component, page, layout, CSS, Tailwind, and shadcn work.
oma-hwp	>