Metadata and PDF

Metadata is hidden information in a computer file that may contain potentially dangerous or embarrassing information or lead to an accidental disclosure. In Office documents, there are many instances of data hidden in files– such as Word’s Track Changes– that have been highly publicized.

For the most part, PDF is immune to these issues. PDFs represent the visual display as it will be printed. Still, it is a good idea to understand what if any risks are associated with PDF and metadata.

Note: This article was written before Acrobat 8 which includes the “Examine Document” tool which offers powerful metadata removal. Acrobat X offers a “salt the earth” option called “Sanitize Document”.

PDF and Metadata Issues

PDF can support a wide variety of user-designated metadata by using special tools Adobe offers to developers to take advantage of XMP– the Extensible Metadata Platform.

Practicaly, however, use of other metadata structures is very unusual. In fact, I’ve never seen it in the Legal market as it requires special development tools to insert the metadata in the PDF. No common desktop tool including Office tools offer the ability to insert anything other than title, subject, author and keywords.

When saving back to Word from Acrobat, your firm’s house styles are not maintained (an unlikely but possible concern that some firms have shared with me), so no issue there.

So, realistically, here is what firms should be concerned about before sending out a PDF:

Are the Title, Subject, Author and Keyword fields empty?
Can be performed manually via Document Properties–>Description or in batch in Acrobat Professional.
Are there annotations on the document?
Can be deleted using the Comments tab in Acrobat or in batch in Acrobat Professional.
Is there anything damaging in the document itself?
e.g. Was it created with Track Changes showing in the Word document. Are there items that need to be redacted?
You should use a good redaction tool such as Appligent’s Redax <http://www.appligent.com>
Has someone added unusual metadata?
Go to File–>Document Properties and click on the Additional Metadata button. You will see a number of default metadata options here, none of which are populated by Office apps other than Title, Subject, Author and Keywords. You can delete these placeholders if needed, but since there is no data there, it should not be a concern.

When you really want to be sure– Flattening

You can eliminate all data in a PDF file by “flattening” it to a TIFF file. Choose File–>Save As and choose TIFF. Acrobat will create a series of numbered pages. E.g. If you have document.pdf wih three pages, you will get:
– document-1.tif
– document-2.tif
– document-3.tif

You can also convert PDF files to TIFF using Batch Processing in Acrobat Professional.

Final Thoughts and Best Practices
Unlike Office Tools, metadata doesn’t get into PDF accidentally. You have to try really hard to put it there.

As a best practice, it is a good idea to use a tool such as Payne Software’s Metadata Assistant to remove metadata in Word prior to producing a PDF.

One positive step you can do to limit transfer of information to Acrobat is to modify the defaults of Acrobat’s one-button PDF Creators. In Word with no documents open, go to the Adobe PDF Menu and choose Change Conversion Settings. Deselect “Convert Document Information” and click OK. Quit Word to make the settings persistent. With this setting changed, the only piece of information you’ll ever see converted is the Title of the document.

One thing to think about is your firm’s risk tolerance. Are you going to mitigate risk or try to eliminate it? About the only way to eliminate risks associated with documents is to never send them outside your firm. Since that is not practical, Acrobat and PDF offers a proven, safe platform for reliably sharing documents with minimal risk– right out of the box.