ocrmypdf-api

Name: ocrmypdf-api
Author: partme-ai/full-stack-skills

$npx mdskill add partme-ai/full-stack-skills/ocrmypdf-api

Enables programmatic OCR on PDFs via Python API and supports alternative OCR engines through plugins.

Helps automate PDF text extraction and processing in Python applications.
Integrates with OCRmyPDF library and plugins like EasyOCR, PaddleOCR, and AppleOCR.
Uses configurable parameters such as language and optimization to tailor OCR operations.
Returns exit codes and processed PDF files for integration into workflows.

SKILL.md

.github/skills/ocrmypdf-apiView on GitHub ↗

---
name: ocrmypdf-api
description: OCRmyPDF Python API and plugin skill — use OCRmyPDF programmatically from Python, integrate with applications, and extend with plugins (EasyOCR, PaddleOCR, AppleOCR). Use when the user needs to call OCRmyPDF from Python code, build OCR pipelines, or use alternative OCR engines.
---

# OCRmyPDF — Python API & Plugins Guide

## Overview

OCRmyPDF provides a Python API for programmatic use and a plugin interface for extending or replacing OCR engines. This skill covers the Python API, integration patterns, and the plugin ecosystem.

For CLI usage, see the **ocrmypdf** skill. For batch scripting, see **ocrmypdf-batch**.

## Python API

### Basic usage

```python
import ocrmypdf

# Basic OCR
exit_code = ocrmypdf.ocr('input.pdf', 'output.pdf')

# With options
exit_code = ocrmypdf.ocr(
    'input.pdf',
    'output.pdf',
    language='eng+fra',
    deskew=True,
    rotate_pages=True,
    skip_text=True,
    optimize=2,
    jobs=4,
)
```

### Return codes

```python
import ocrmypdf

result = ocrmypdf.ocr('input.pdf', 'output.pdf')

if result == ocrmypdf.ExitCode.ok:
    print("OCR completed successfully")
elif result == ocrmypdf.ExitCode.already_done_ocr:
    print("PDF already has OCR text")
elif result == ocrmypdf.ExitCode.input_file:
    print("Input file issue")
else:
    print(f"Error: {result}")
```

### Common API parameters

| Parameter | Type | Description |
|-----------|------|-------------|
| `language` | str | Tesseract language(s), e.g. `'eng+fra'` |
| `deskew` | bool | Straighten crooked pages |
| `rotate_pages` | bool | Auto-rotate pages |
| `skip_text` | bool | Skip pages that already have text |
| `force_ocr` | bool | Force OCR on all pages |
| `redo_ocr` | bool | Replace existing OCR |
| `optimize` | int | Optimization level (0-3) |
| `output_type` | str | `'pdfa'`, `'pdf'`, `'auto'`, `'none'` |
| `jobs` | int | Number of parallel workers |
| `sidecar` | str | Path for sidecar text file |
| `image_dpi` | int | DPI for image inputs |
| `clean` | bool | Clean pages with unpaper (OCR only) |
| `clean_final` | bool | Clean pages and use in output |
| `remove_background` | bool | Remove noisy backgrounds |
| `oversample` | int | Oversample DPI for low-res images |
| `pages` | str | Page range, e.g. `'1,3,5-10'` |
| `title` | str | Output PDF title |
| `author` | str | Output PDF author |

### Integration example: Flask web service

```python
from flask import Flask, request, send_file
import ocrmypdf
import tempfile
import os

app = Flask(__name__)

@app.route('/ocr', methods=['POST'])
def ocr_endpoint():
    """OCR a PDF via HTTP POST."""
    if 'file' not in request.files:
        return {'error': 'No file uploaded'}, 400

    uploaded = request.files['file']
    with tempfile.NamedTemporaryFile(suffix='.pdf', delete=False) as inp:
        uploaded.save(inp.name)
        out_path = inp.name.replace('.pdf', '_ocr.pdf')

    try:
        result = ocrmypdf.ocr(
            inp.name, out_path,
            language='eng',
            skip_text=True,
            optimize=2,
        )
        if result == ocrmypdf.ExitCode.ok:
            return send_file(out_path, as_attachment=True,
                             download_name='ocr_output.pdf')
        return {'error': f'OCR failed: {result}'}, 500
    finally:
        os.unlink(inp.name)
        if os.path.exists(out_path):
            os.unlink(out_path)

if __name__ == '__main__':
    app.run(port=5000)
```

### Streamlit web UI

OCRmyPDF provides an optional Streamlit-based web UI:

```bash
pip install ocrmypdf[webservice]
# See OCRmyPDF docs for launching the web service
```

## Plugin Ecosystem

OCRmyPDF's plugin interface allows replacing the OCR engine. Available plugins:

### OCRmyPDF-EasyOCR

Replaces Tesseract with [EasyOCR](https://github.com/JaidedAI/EasyOCR) (PyTorch-based). GPU strongly recommended.

```bash
pip install ocrmypdf-easyocr

# Usage
ocrmypdf --plugin ocrmypdf_easyocr -l en input.pdf output.pdf
```

### OCRmyPDF-PaddleOCR

Replaces Tesseract with [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR). Powerful GPU-accelerated engine.

```bash
pip install ocrmypdf-paddleocr

# Usage
ocrmypdf --plugin ocrmypdf_paddleocr input.pdf output.pdf
```

### OCRmyPDF-AppleOCR

Replaces Tesseract with Apple Vision Framework. macOS only.

```bash
pip install ocrmypdf-appleocr

# Usage
ocrmypdf --plugin ocrmypdf_appleocr input.pdf output.pdf
```

### paperless-ngx Integration

[paperless-ngx](https://docs.paperless-ngx.com/) uses OCRmyPDF internally for searchable document management. See paperless-ngx docs for configuration.

## Custom Plugins

Create a custom OCR plugin by implementing the OCRmyPDF plugin interface:

```python
# my_ocr_plugin.py
from ocrmypdf import OcrEngine, hookimpl

class MyOcrEngine(OcrEngine):
    """Custom OCR engine implementation."""

    @staticmethod
    def version():
        return "1.0.0"

    @staticmethod
    def creator_tag(options):
        return "MyOCR"

    def recognize(self, input_file, output_file, output_text, options):
        # Implement OCR logic here
        pass

@hookimpl
def get_ocr_engine():
    return MyOcrEngine()
```

```bash
# Use custom plugin
ocrmypdf --plugin my_ocr_plugin input.pdf output.pdf
```

## Quick Reference

| Task | Code / Command |
|------|----------------|
| Python API basic | `ocrmypdf.ocr('in.pdf', 'out.pdf')` |
| With options | `ocrmypdf.ocr('in.pdf', 'out.pdf', language='eng', deskew=True)` |
| Check result | `if result == ocrmypdf.ExitCode.ok: ...` |
| EasyOCR plugin | `ocrmypdf --plugin ocrmypdf_easyocr in.pdf out.pdf` |
| PaddleOCR plugin | `ocrmypdf --plugin ocrmypdf_paddleocr in.pdf out.pdf` |
| AppleOCR plugin | `ocrmypdf --plugin ocrmypdf_appleocr in.pdf out.pdf` |

## Troubleshooting

- **Import error**: Ensure `pip install ocrmypdf` in your Python environment.
- **Plugin not found**: Check plugin is installed (`pip install ocrmypdf-easyocr`).
- **GPU not used (EasyOCR/PaddleOCR)**: Ensure CUDA/GPU drivers are installed.
- **Memory issues**: Use `jobs=1` for large files; process in batches.

## References

- [OCRmyPDF API Reference](https://ocrmypdf.readthedocs.io/en/latest/api.html)
- [OCRmyPDF Plugin Interface](https://ocrmypdf.readthedocs.io/en/latest/plugins.html)
- [OCRmyPDF-EasyOCR](https://github.com/ocrmypdf/OCRmyPDF-EasyOCR)
- [OCRmyPDF-PaddleOCR](https://github.com/clefru/ocrmypdf-paddleocr)
- [OCRmyPDF-AppleOCR](https://github.com/mkyt/ocrmypdf-AppleOCR)
- [paperless-ngx](https://docs.paperless-ngx.com/)

More from partme-ai/full-stack-skills

Skill	Description
adobe-xd	"Guides creation of UI/UX designs, interactive prototypes, reusable components, and design specs in Adobe XD. Use when the user asks about Adobe XD artboards, prototype links, repeat grids, component states, design tokens export, or developer handoff."
angular	"Provides comprehensive guidance for Angular framework including components, modules, services, dependency injection, routing, forms, and TypeScript integration. Use when the user asks about Angular, needs to create Angular applications, implement Angular components, or work with Angular features."
ansible	"Provides comprehensive guidance for Ansible automation including playbooks, roles, inventory, and module usage. Use when the user asks about Ansible, needs to automate IT tasks, create Ansible playbooks, or manage infrastructure with Ansible."
ant-design-mini	"Builds mini-program UIs with Ant Design Mini components for Alipay and WeChat mini-programs. Covers Button, Form, List, Modal, Tabs, NavBar, and 60+ components with theme customization and CSS variable theming. Use when the user needs to create mini-program interfaces with Ant Design Mini, configure themes, or implement mini-program-specific UI patterns."
ant-design-mobile	"Builds React mobile UIs with Ant Design Mobile (antd-mobile) components including Button, Form, List, Modal, Picker, Tabs, PullToRefresh, InfiniteScroll, and 50+ mobile-optimized components. Use when the user needs to create mobile-first React interfaces, implement mobile navigation, forms, or data display with Ant Design Mobile."
ant-design-react	"Builds enterprise React UIs with Ant Design (antd) including 60+ components (Button, Form, Table, Select, Modal, Message), design tokens, TypeScript support, and ConfigProvider theming. Use when the user needs to create React applications with Ant Design, build forms with validation, display data tables, or customize the Ant Design theme."
ant-design-vue	Provides comprehensive guidance for Ant Design Vue (AntDV) component library for Vue 3. Covers installation, usage, API reference, templates, and all component categories. Use when building enterprise-class UI with Vue 3 and Ant Design.
api-doc-generator	"Generate API documentation by scanning Controller classes, extracting endpoint URLs, HTTP methods, parameters, and response structures, then producing standardized docs from templates. Use when the user explicitly mentions generating API documentation, creating API docs, scanning interfaces, or documenting REST APIs. Do not trigger for generic documentation requests without explicit API mention."
appium	"Provides comprehensive guidance for Appium mobile testing including mobile app automation, element location, gestures, and cross-platform testing. Use when the user asks about Appium, needs to test mobile applications, automate mobile apps, or write Appium test scripts."
ascii-ansi-colorizer	"Add an ANSI color layer to existing ASCII/plain-text output (gradient/rainbow/highlights) with alignment-safe rules and a required no-color fallback. Use when the user wants to colorize terminal output, add rainbow effects to CLI text, or style ASCII art with ANSI colors."