Correcting OCR Errors
Optical Character Recognition, commonly referred to as OCR, is the process of converting scanned images of letters and words into a electronic versions. For example, you can use the Recognize Text feature in Acrobat DC to convert an image of a page into a searchable version in which you can select text, comment on it and even edit it.
OCR is an imperfect process. While some very good originals will process at or near 100% accuracy, if you feed Acrobat a poor quality document, results will suffer. So, yes, a fax of a fax of fax is not going to OCR well. Scanned documents may also contain handwriting which seldom is recognized as text.
OCR affects search quality and that should be a concern to legal professionals. Consider a contract that may be part of your case. Perhaps the only place your client’s name can be found in the document is in handwritten Name and Signature fields.
If you use Acrobat (or other tools) to search for your client name, no result will be returned. Since your client’s name is an important term for most cases, you might want to consider correcting key documents to enhance search results.
Fortunately, Acrobat DC includes tools to help you audit OCR quality and correct OCR errors.
Auditing OCR Quality
Acrobat offers a feature in Preflight called “Make OCR Text Visible” which can help you audit OCR quality. Here’s how to use it:
- OCR the document or open a previously OCR’d document.
Tip: Choose the Enhance Scans option in the Right Hand Pane, then choose Recognize Text - In the Right Hand Pane
- Enter Preflight in the search field
- Click the Preflight tool
- The Preflight window opens.
- In the search field, enter Make OCR
- Select the Make OCR text visible fixup function
- Click Analyze and Fix
-
- Acrobat will ask you to renamed the file. I suggest adding “_QA” to the file name.
Looking at the Results
To QA the document, first open the Layers Panel in the file: