OCR & Advanced Settings
How PulpPDF's OCR works, DPI control, and what happens under the hood.
OCR — Text Recognition
Add searchable text to scanned PDFs using your OS's built-in OCR engine.
| Platform | Engine | Requirements |
|---|---|---|
| macOS | Vision Framework | macOS 10.15+ |
| Windows | Windows.Media.Ocr | Windows 10 1809+ |
How It Works
- Each page is rendered at 300 DPI
- The native OCR engine recognizes text and positions
- An invisible text layer is added to the page
- Text becomes searchable and copy-pasteable
Smart Detection
PulpPDF checks each page individually:
- ✓ OCR — all pages have text (OCR skipped)
- OCR 5/10 — partial (only scanned pages get OCR)
- ✗ OCR — scanned document (all pages get OCR)
The OCR toggle persists between sessions.
DPI Control (Ultra Mode)
| DPI | Best for |
|---|---|
| 72 | Minimal size |
| 96 | Screen reading |
| 150 | General purpose (default) |
| 200 | Detailed documents |
| 300 | Print quality |
Higher DPI = sharper but bigger. 150 is a good balance for screens, 300 for print.
Under the Hood
Structural Optimization
Every preset (except None) runs a structural optimization pass that generates object streams, compresses all streams, removes unused objects, and forces PDF 1.7 compatibility.
Image Recompression
For High Quality, Balanced, and Maximum: embedded images are decoded, optionally downscaled, and re-encoded as JPEG. Only replaced if the new version is smaller.
Two-Pass Pipeline
- Flatten pass (unpack objects for image access)
- Image recompression (decode, resize, re-encode)
- Optimization pass (repack with object streams)
