OCR & Advanced Settings

OCR — Text Recognition

Add searchable text to scanned PDFs using your OS's built-in OCR engine.

Platform	Engine	Requirements
macOS	Vision Framework	macOS 10.15+
Windows	Windows.Media.Ocr	Windows 10 1809+

How It Works

Each page is rendered at 300 DPI
The native OCR engine recognizes text and positions
An invisible text layer is added to the page
Text becomes searchable and copy-pasteable

Smart Detection

PulpPDF checks each page individually:

✓ OCR — all pages have text (OCR skipped)
OCR 5/10 — partial (only scanned pages get OCR)
✗ OCR — scanned document (all pages get OCR)

The OCR toggle persists between sessions.

DPI Control (Ultra Mode)

DPI	Best for
72	Minimal size
96	Screen reading
150	General purpose (default)
200	Detailed documents
300	Print quality

Higher DPI = sharper but bigger. 150 is a good balance for screens, 300 for print.

Under the Hood

Structural Optimization

Every preset (except None) runs a structural optimization pass that generates object streams, compresses all streams, removes unused objects, and forces PDF 1.7 compatibility.

Image Recompression

For High Quality, Balanced, and Maximum: embedded images are decoded, optionally downscaled, and re-encoded as JPEG. Only replaced if the new version is smaller.

Two-Pass Pipeline

Flatten pass (unpack objects for image access)
Image recompression (decode, resize, re-encode)
Optimization pass (repack with object streams)