PDF/A: PDF for Archving

It’s 2007. You just created an important document for a client— a complex regulatory filing for the client’s new power plant.

Fast forward to 2027—twenty years from today. Will the documents open?

In the legal industry, document conversion problems are legion. Many attorneys started with WordPerfect and may have migrated to Word. Opening all of those old documents can be troublesome.

Will everything convert? Will it look the same?

The PDF format, designed to capture the printed intent of a document, is a great solution. With over half a billion copies of Adobe Reader installed, PDF has been a de facto standard. Adobe publishes the specification for the PDF, and over 1000 third-party products create, consume or work with PDF in one way or another.

However, government and industry need more assurances—they require de jure standards. A de jure standard is endorsed by an independent standards body such as the International Organization for Standardization (ISO).

Fortunately, we have PDF/A, an ISO standard.

Read on to learn more about PDF/A. In my next article, I’ll discuss how you can use Acrobat to create and convert PDF/A compliant files.

PDF/A Origins

PDF/A was developed by the PDF/A Joint Working Group working under the auspices of AIIM (Association for Information and Image Management).

AIIM has a rich history of developing standards for information management, document management and imaging.

The legal market was well represented in the working group. Chairing the group was Stephen Levenson, from the Administrative Office of the U.S. Courts. The courts, of course, need to safeguard important court decisions stored in electronic formats.

The PDF/A working group first met in mid-2002 and the specification was formally approved in May, 2005. I’m told that is extraordinarily fast which indicates a lot of desire in the document archiving community around this standard.

PDF/A Format

First things first—I’m not here offer a complete technical overview of PDF/A.

To keep it simple, here’s what law firms need to know about PDF/A:

Do’s and Don’ts

The PDF/A specification notes that documents should be self-contained, unfettered, device independent and tagged. What does that mean?

Self-Contained

Long-term predictability requires that documents do not rely on outside elements to render properly. It makes sense that PDF/A requires that fonts are embedded in the document.

What fonts are used in your organization? Some fonts have a do not embed flag which prevent them from being embedded by Adobe Acrobat.

Fonts add considerably to the “weight” of electronic files, so you can expect that PDF/A files may be larger than the same PDF without the fonts embedded.

A self-contained document should not be reliant on any outside media player or scripting system. PDF/A does not allow external links, embedded files, JavaScript or multimedia elements in documents.

These restrictions rule out certain kinds of documents. For example, a rich, cross-linked eBrief may not be PDF/A compatible.

Device-Independent

The PDF/A spec demands that color is expressed in a device-independent manner.

If you’ve ever looked at the output from two different printers or monitors, you can easily detect subtle differences. For long term archiving, wouldn’t you want to be able to know what the color was really supposed to look like?

Put simply, device independence means using a known, standard color space. Software in the application or operating system can then translate the known space to the user color space—e.g. your printer or monitor.

One color space I’d recommend for law firms is the sRGB (Standard RGB). sRGB is supported by most digital cameras and Adobe’s product line including Photoshop.

Unfettered

The PDF/A specification insists that documents are unencumbered. PDF security of any kind is not allowed. Besides, who would remember a password twenty years from now?

Self-documenting

PDF/A documents require a metadata structure in the file. Metadata—information about documents— may used to record items such as Title, Subject, Author, Keyword, and so on. PDF/A1-a does not dicate that user fields are populated, but the metadata structure must be present in the document.

Metadata has a negative connotation in the legal market, but the intent with metadata in PDF/A is to allow future readers of documents to more easily search and classify material.

Tagged

Tagging is the structure added to documents so that the visually impaired may more easily consume the document.

Tagging offers anybody reading documents on a computer screen a number of benefits, however.

See my article Understanding tagging for more information.

Who is adopting PDF/A?

The National Archives and Records Administration (US) and the Swedish National Archives have announced that they will accept submissions in PDF/A format.

I’ve also seen interest from some regulatory bodies such as public utility commissions.

What’s Next?

In my next article, I’ll cover how to create PDF/A files and convert existing files to PDF/A.

Suggested for Further Reading

AIM PDF/A Working Group Page

Adobe White Paper on Archiving