Acrobat 8: New Examine Document Feature
Note: The Examine Document feature has been updated in Acrobat 9 to allow you to more granularly select and preview metadata. You may wish to watch this hour long Redaction and Metadata Removal eSeminar. Acrobat X offers an even more powerful “Sanitize Document” command.
These days, lawyers and their firms often think twice before emailing a file.
Does the document contain hidden information— metadata— that could lead to an accidental disclosure?
Office documents can contain hidden information that could be potentially damaging. Some examples are track change information in Word, comments, Title/Subject/Author keywords and so on. The list is long.
I should mention that the legal market very broadly defines metadata compared to other industries. To keep things simple, we’ll define metadata then as anything you can’t see in the document that could get you in trouble.
Large firms often use products like Workshare Protect, Payne Metadata Assistant or iScrub to clean Office documents before sending them out. If you must send out raw Office documents, this is a very good practice. Unfortunately, usage of these tools isn’t a universal, especially so among solos and smaller firms.
Some firms take a “PDF First” approach, preferring to send out PDFs. Comparatively, PDFs are benign compared to Office formats. Still, PDFs can contain “metadata” that could be an issue. For example, you might type damaging information an an Acrobat sticky note and send the document to opposing counsel.
There have been several instances of improperly redacted PDFs. In one document I examined, the author had used the Borders and Shading option in Word to cover up text and then converted to PDF. The text of course was still in the Word file and was also available when converted to PDF.
A knowledgeable user’s document practices normally prevent these kind of problems. However, even experienced users desire failsafe solutions.
Fortunately, Acrobat 8 adds a new feature called Examine Document which eliminates hidden text and other metadata from PDFs. Examine Document is available in both Acrobat 8 Standard and Professional.
Read on more more information about metadata, ways to remove it in a single document, and metadata removal in batch.
How does Metadata get in a PDF?
Metadata— using the broad legal definition— can manifest in a PDF in two ways:
Because a user did something in Acrobat
- File Attachments
- Annotations and Comments
- Form Field Information or Actions
- OCR Text
- Hidden Layers
- Embedded Search Index
Because an authoring application added metadata or covered up information before it was converted to PDF.
- Bookmarks generated from Word and WordPerfect
- Improper redactions in Word
- Metadata added from scanning applications
Regardless of how it got there, sometimes you will want to “nuke” the problems and move on.
Basic use of the Examine Document Feature
- Choose Document–>Examine Document.
- The Examine Document Window appears
- Click the Check All checkbox
- Click the Remove all Checked Items button.
Best Practices for Saving Clean Files
If you choose File–>Save, Acrobat will prompt you to rename your document. I suggest adding _clean or _ok to the filename (before the .pdf suffix) which allows you easily identify clean files on the desktop.
What Metadata Removal Breaks and Offers
As my friend Sherry Kappel of the document technology firm Microsystems says, metadata is bit like cholesterol. There’s good cholesterol and bad cholesterol.
Indeed, metadata is used by Acrobat to take care of some functions internally.
Metadata removal will disable Reader-extended rights such as commenting, the Typewriter tool and digital signatures. The PDFs still can be opened, but the tokens that unlock the functions in the Adobe Reader are removed.
Metadata is used to track Headers/Footers and Bates Numbers in a PDF. If you remove metadata, you will not longer be able to go to Advanced–>Document Processing–>Bates Numbers and use the remove option.
Since the Examine Document feature can remove hidden text, it can be used to remove the invisible text layer from an OCR’d Image+Text PDF. That is useful if you have several OCRd PDFs that you need to send to the other side. Why give them the advantage of “free” OCR?
Batch Metadata Removal
Some features in the Examine Document tool are available via the PDF Optimizer. Batch metadata removal is possible because Acrobat Professional’s Batch Processing functionality can call the PDF Optimizer.
Sadly, Hidden Text removal is not possible in batch.
Creating a Batch Sequence for Metadata Removal
- In Acrobat Professional, choose Advanced–>Document Processing–> Batch Processing
- Click the New Sequence button.
- Give the sequence a name such as “Metadata Removal” and click OK
- Click the Output Options Button
- Check the PDF Optimizer and click the Settings button next to it.
- Select_ Discard User Data _from the list at the left.
Check all (or some) of the options.
- Select Discard Objects from the list at the left.
- Check all (or some) of the options.
- Click OK. Name the new PDF Optimizer Settings when prompted.
- Click OK three times more to exit.
Running the Batch Metadata Removal Sequence
- Choose Advanced–>Document Processing–>Batch Processing
- Select the sequence to run
- Click OK
- Select the folder to process
Click the Select button.
- Select the Output Folder