ocrmypdf-optimize

$npx mdskill add partme-ai/full-stack-skills/ocrmypdf-optimize

Compress PDFs, configure PDF/A archival output, and apply JBIG2 encoding for file size reduction and optimization.

  • Helps reduce PDF file size, create archival PDF/A documents, and optimize OCR output.
  • Integrates with OCRmyPDF tool for compression, PDF/A conversion, and JBIG2 encoding.
  • Uses command-line options like --optimize levels and --output-type to specify compression and archival formats.
  • Delivers results by generating optimized PDF files based on user-specified parameters.

SKILL.md

.github/skills/ocrmypdf-optimizeView on GitHub ↗
---
name: ocrmypdf-optimize
description: OCRmyPDF optimization skill — compress PDFs, configure PDF/A output, JBIG2 encoding, and lossless optimization. Use when the user needs to reduce PDF file size, create archival PDF/A files, or optimize OCR output.
---

# OCRmyPDF — Optimization Guide

## Overview

[OCRmyPDF](https://github.com/ocrmypdf/OCRmyPDF) provides extensive optimization options to reduce file size, create PDF/A archival documents, and configure output quality.

For core OCR functionality, see the **ocrmypdf** skill. For image processing (deskew, rotate, clean), see **ocrmypdf-image**. For batch/Docker/scripting, see **ocrmypdf-batch**.

## Compression Levels

```bash
# Level 0 — no optimization (fastest)
ocrmypdf --optimize 0 input.pdf output.pdf

# Level 1 — lossless (default)
ocrmypdf --optimize 1 input.pdf output.pdf

# Level 2 — lossy (aggressive)
ocrmypdf --optimize 2 input.pdf output.pdf

# Level 3 — lossless, aggressive JPEG recompression
ocrmypdf --optimize 3 input.pdf output.pdf
```

## PDF/A Output

PDF/A is an archival format with embedded fonts and colorspaces:

```bash
# PDF/A-1b (basic, default)
ocrmypdf --output-type pdfa input.pdf output.pdf

# PDF/A-2b (includes transparency)
ocrmypdf --output-type pdfa2b input.pdf output.pdf

# PDF/A-2u (Unicode)
ocrmypdf --output-type pdfa2u input.pdf output.pdf

# Standard PDF (no archival)
ocrmypdf --output-type pdf input.pdf output.pdf
```

## JBIG2 Encoding

JBIG2 provides excellent compression for monochrome (1-bit) images:

```bash
# Enable JBIG2 (requires jbig2enc)
ocrmypdf --jbig2-lossy input.pdf output.pdf  # Lossy

ocrmypdf --jbib2-lossless input.pdf output.pdf  # Lossless (v17+)
```

**Requirements**:

```bash
# Debian/Ubuntu
apt install jbig2enc

# macOS
brew install jbig2enc
```

## PNG Optimization

Optimize embedded PNG images:

```bash
# Use pngquant for lossy compression
ocrmypdf --png-lossy input.pdf output.pdf

# Lossless PNG optimization
ocrmypdf --png-lossless input.pdf output.pdf
```

## Ghostscript Options

Fine-tune PDF processing with Ghostscript:

```bash
# Set PDF minor version
ocrmypdf --pdf-renderer hatch input.pdf output.pdf

# Use pdfimages for better image extraction
ocrmypdf --pdf-renderer img2pdf input.pdf output.pdf
```

## Sidecar Text

Generate text file alongside PDF without modifying PDF:

```bash
# Generate sidecar only
ocrmypdf --output-type none --sidecar text.txt input.pdf output.pdf

# Typical sidecar workflow
ocrmypdf --sidecar text.txt --force-ocr input.pdf output.pdf
```

## Combined Recipes

### Maximum compression

```bash
ocrmypdf --optimize 3 --jbig2-lossy --png-lossy input.pdf small.pdf
```

### Archival PDF/A with compression

```bash
ocrmypdf --output-type pdfa --optimize 2 input.pdf archival.pdf
```

### Lossless output

```bash
ocrmypdf --output-type pdf --optimize 1 --png-lossless input.pdf lossless.pdf
```

## Quick Reference

| Task | Command |
|------|---------|
| No optimization | `--optimize 0` |
| Lossless default | `--optimize 1` |
| Aggressive lossy | `--optimize 2` |
| Max quality | `--optimize 3` |
| PDF/A-1b (default) | `--output-type pdfa` |
| PDF/A-2b | `--output-type pdfa2b` |
| JBIG2 lossy | `--jbig2-lossy` |
| PNG lossy | `--png-lossy` |
| Sidecar text | `--sidecar text.txt` |

## Troubleshooting

- **Large file size**: Try `--optimize 2` or `--png-lossy`.
- **PDF/A validation fails**: Use `--output-type pdfa2b` for better compatibility.
- **Font issues**: PDF/A-2u ensures full Unicode support.

More from partme-ai/full-stack-skills

SkillDescription
adobe-xd"Guides creation of UI/UX designs, interactive prototypes, reusable components, and design specs in Adobe XD. Use when the user asks about Adobe XD artboards, prototype links, repeat grids, component states, design tokens export, or developer handoff."
angular"Provides comprehensive guidance for Angular framework including components, modules, services, dependency injection, routing, forms, and TypeScript integration. Use when the user asks about Angular, needs to create Angular applications, implement Angular components, or work with Angular features."
ansible"Provides comprehensive guidance for Ansible automation including playbooks, roles, inventory, and module usage. Use when the user asks about Ansible, needs to automate IT tasks, create Ansible playbooks, or manage infrastructure with Ansible."
ant-design-mini"Builds mini-program UIs with Ant Design Mini components for Alipay and WeChat mini-programs. Covers Button, Form, List, Modal, Tabs, NavBar, and 60+ components with theme customization and CSS variable theming. Use when the user needs to create mini-program interfaces with Ant Design Mini, configure themes, or implement mini-program-specific UI patterns."
ant-design-mobile"Builds React mobile UIs with Ant Design Mobile (antd-mobile) components including Button, Form, List, Modal, Picker, Tabs, PullToRefresh, InfiniteScroll, and 50+ mobile-optimized components. Use when the user needs to create mobile-first React interfaces, implement mobile navigation, forms, or data display with Ant Design Mobile."
ant-design-react"Builds enterprise React UIs with Ant Design (antd) including 60+ components (Button, Form, Table, Select, Modal, Message), design tokens, TypeScript support, and ConfigProvider theming. Use when the user needs to create React applications with Ant Design, build forms with validation, display data tables, or customize the Ant Design theme."
ant-design-vueProvides comprehensive guidance for Ant Design Vue (AntDV) component library for Vue 3. Covers installation, usage, API reference, templates, and all component categories. Use when building enterprise-class UI with Vue 3 and Ant Design.
api-doc-generator"Generate API documentation by scanning Controller classes, extracting endpoint URLs, HTTP methods, parameters, and response structures, then producing standardized docs from templates. Use when the user explicitly mentions generating API documentation, creating API docs, scanning interfaces, or documenting REST APIs. Do not trigger for generic documentation requests without explicit API mention."
appium"Provides comprehensive guidance for Appium mobile testing including mobile app automation, element location, gestures, and cross-platform testing. Use when the user asks about Appium, needs to test mobile applications, automate mobile apps, or write Appium test scripts."
ascii-ansi-colorizer"Add an ANSI color layer to existing ASCII/plain-text output (gradient/rainbow/highlights) with alignment-safe rules and a required no-color fallback. Use when the user wants to colorize terminal output, add rainbow effects to CLI text, or style ASCII art with ANSI colors."