Understanding “Flavors” of PDF

Most people know that Acrobat files can contain a variety of types of information: text, images, and OCR’d information.

Each of these is a “flavor” of PDF with different capabilities and issues. PDF flavors are behind some oft-heard questions I receive such as:

Not all PDFs are created equal. Some PDFs are more usable or offer benefits that other typed do no.

I’ll examine the different flavors below and make some recommendations.

Why does this matter?

If you choose the wrong flavor of PDF or compression, you may run into the following problems:

I’ve met with a great many law firms and have seen some pretty wacky methods of creating PDF. It is not uncommon to see someone print out a Word document and then scan it back in to create a PDF! Ack!

Flavors of PDF

The table below discusses the four basic flavors of PDF:

PDF Normal

PDF Image Only

PDF Image+Text


What is it?

Often called an “electronic PDF”, this type of PDF has never hit paper and was converted directly from an electronic source.

An image in a PDF wrapper. Could be an image of a page of text or a JPEG, etc. inside a PDF.

An image inside a PDF with an invisible layer of searchable text.

Any of the types at left.

Where does it come from?

Produced directly from a software application by “printing” to PDF or using the 1-button PDF creators supplied by Acrobat

Scanners, Digital Copy Machines, TIFFs converted to PDF.

An image-only file that has been OCR’d using Acrobat Standard or Professional.

Create from Multiple Files in Acrobat allows you to combine any kinds of PDFs together.

Is it searchable?

Yes 100% accurate since no OCR has taken place

No. Does not contain any searchable text.

Yes OCR is not a perfect process. Do not expect 100% accuracy.

Depends If the combined PDFs are searchable, yes.


Prints fastest. Prints at best quality. Smallest file size.

Recommend no more than 300dpi for scanning. A good format to use in discovery when you don’t want to give the other side an advantage.

Best way to make paper documents searchable.

Can contain multiple document sizes.

PDF Settings Affecting File Size

PDF Normal offers the best performance, smallest file size and best searchability. These fully electronic files contain all the fonts needed for printing. If you have an option to create PDF Normal, always use it!

When creating PDFs from paper, carefully choose your compression and scanning resolution.

There are three common black & white compression algorithms used for scanned images:

File Size


Larger | | | | | Smaller

CCITT Group 4

JBIG2 Lossless

JBIG2 Lossy

If you choose Create PDF from Scanner in Acrobat, the default compression is JBIG2 Lossless. This offers a great balance between file size and quality.

Other hardware and software products that scan to PDF generally use the CCITT Group 4 compression which is considerable larger.

CCITT Group 4 compression was developed as a fax compression technology. The rudimentary processors of fax machines in the early 1980s had just enough power to decompress CCITT Group 4 files. Surprisingly, it is still widely used, but is an inefficient compression scheme.

While rarely relevant in the legal market, Acrobat is intelligent enough to compress files selectively using Adaptive compression. A color brochure may have black text, a color image and line art, each of which can have different compression schemes. If you need to scan color brochures and the like– perhaps in an Intellectual Property dispute– choose the Searchable Image-Compact option.

I’ve conducted several visual tests on JBIG2 Lossless versus Lossy. It is difficult to detect the differences between these two compression schemes on good quality scanned documents. If you have good originals, go ahead and use the Lossy JBIG2.

File Size Comparison

The table below compares the file sizes of a typical 8.5″ by 11″ legal document for various flavors of PDF:

Single Page Legal Document – 200 DPI

PDF Normal

PDF Image Only 200 dpi

PDF Image Only 200 dpi

PDF Image Only 200 dpi

PDF+Text 200 dpi






Compression and Notes

Fonts Embedded, no tags


JBIG2 Lossless

JBIG2 Lossy

JBIG2 Lossy Compression

Single Page Legal Document – 300 DPI

PDF Normal

PDF Image Only 300 dpi

PDF Image Only 300 dpi

PDF Image Only 300 dpi

PDF+Text 300 dpi






Compression and Notes

Fonts Embedded, no tags


JBIG2 Lossless

JBIG2 Lossy

JBIG2 Lossy Compression

Testing Protocol

NOTE: I did these tests back in the Acrobat 7 timeframe. Current versions of Acrobat offer more robust compression (Adaptive Compression in Acrobat X) and generally work better.

  1. The PDF Normal file was created by choosing the Adobe PDF print driver. [Note 1]
  2. The PDF Normal file was opened in Acrobat and saved as either 200 or 300 dpi uncompressed TIFFs.
  3. PDF Optimizer was used to target three types of compression: CCITT G4, JPBIG2 Lossless and JBIG2 Lossy.
  4. All image and image+text PDFs were created using Acrobat 7 by choosing Recognize Text Using OCR.


Here are my tips for making the best choices when working with PDF files:

  1. Where did that PDF come from? You need to know . . .
    Unless you scan it in yourself using the Create PDF from Scanner option in Acrobat, most likely your PDF file could be made a lot smaller using the PDF Optimizer in Acrobat Professional. Chances are the image-only and image+text PDFs you get from outside your firm use, old, inefficient CCITT Group 4 compression.
  2. Keep Electronic Documents Electronic
    Always convert electronic documents directly to PDF using the 1-button PDF Creators installed by Acrobat into Office applications or using the Adobe PDF print driver. You’ll have a considerably smaller file if you do so and searchability is much better.
  3. **Scan at 300dpi, OCR and then Downsample if Necessary
    **You’ll get more accurate OCR scanning at 300 dpi. Always downsample and compress using the PDF Optimizer in Acrobat Professional after performing OCR. Acrobat Professional can also batch down-sample, too.
  4. Try JBIG2 Lossy Compression
    Although the Lossy word is a bit scary, give this compression scheme a try. Documents still look good on-screen and file sizes can be 50% smaller.


1. Multiple-page PDF Normal files are considerably smaller that mult-page image-only PDFs. Single page PDF Normal files must contain all the fonts necessary to render the page. This information does not need to be duplicated for successive pages.