OCR & Advanced Settings

How PulpPDF's OCR works, DPI control, and what happens under the hood.

OCR — Text Recognition

Add searchable text to scanned PDFs using your OS's built-in OCR engine.

Platform Engine Requirements
macOS Vision Framework macOS 10.15+
Windows Windows.Media.Ocr Windows 10 1809+

How It Works

  1. Each page is rendered at 300 DPI
  2. The native OCR engine recognizes text and positions
  3. An invisible text layer is added to the page
  4. Text becomes searchable and copy-pasteable

Smart Detection

PulpPDF checks each page individually:

  • ✓ OCR — all pages have text (OCR skipped)
  • OCR 5/10 — partial (only scanned pages get OCR)
  • ✗ OCR — scanned document (all pages get OCR)

The OCR toggle persists between sessions.

DPI Control (Ultra Mode)

DPI Best for
72 Minimal size
96 Screen reading
150 General purpose (default)
200 Detailed documents
300 Print quality

Higher DPI = sharper but bigger. 150 is a good balance for screens, 300 for print.

Under the Hood

Structural Optimization

Every preset (except None) runs a structural optimization pass that generates object streams, compresses all streams, removes unused objects, and forces PDF 1.7 compatibility.

Image Recompression

For High Quality, Balanced, and Maximum: embedded images are decoded, optionally downscaled, and re-encoded as JPEG. Only replaced if the new version is smaller.

Two-Pass Pipeline

  1. Flatten pass (unpack objects for image access)
  2. Image recompression (decode, resize, re-encode)
  3. Optimization pass (repack with object streams)