PDF Compression Methods

Technical reference: structural optimization vs image recompression vs full-page rasterization.

Overview

PulpPDF uses three compression strategies, applied depending on the selected preset.

Structural Optimization

Reorganizes the internal PDF structure without touching visual content.

Operation Effect
Object stream generation Groups small objects into compressed streams
Stream compression Applies zlib/deflate to all streams
Unreferenced object removal Strips orphaned objects
PDF version normalization Forces PDF 1.7 output

Used by: All presets except None.

Savings: 5-20%, depending on how well the original PDF was optimized.

Image Recompression

Decodes embedded raster images, optionally resizes them, and re-encodes as JPEG.

Supported input formats

Filter Format
DCTDecode JPEG
FlateDecode Raw/zlib compressed (PNG-like)

Supported color spaces

  • DeviceRGB
  • DeviceGray
  • DeviceCMYK (converted to RGB before encoding)

Process

  1. Decode image from PDF stream
  2. If image dimensions exceed the DPI cap, downscale using Lanczos3 interpolation
  3. Encode as JPEG at the preset's quality level
  4. Compare size: only replace if the new image is smaller

Used by: High Quality (85% / 300 DPI), Balanced (60% / 150 DPI), Maximum (35% / 72 DPI).

Skipped images

  • BitsPerComponent < 8
  • Dimensions < 4x4 pixels
  • Images that would grow after recompression

Full-Page Rasterization

Renders each page as a bitmap and builds a new PDF from JPEG images.

Process

  1. Render each page at target DPI as an RGBA bitmap
  2. Convert RGBA to RGB
  3. Encode as JPEG at 40% quality
  4. Build new PDF with one image per page
  5. Run structural optimization pass

Used by: Ultra preset only.

Trade-off: Maximum compression, but text becomes non-selectable (use OCR to restore searchability).

Two-Pass Pipeline

The standard presets use a two-pass approach:

  1. Flatten pass: Unpack object streams for image access
  2. Image recompression: Iterate image objects, decode, resize, re-encode
  3. Optimization pass: Repack with object streams and stream compression