Fixing Text Reflow Issues when you Copy and Paste Text from PDFs
You’ve just copied some text from a PDF using the selection tool and pasted it into your word processor. Ack! Why doesn’t the text reflow?
Although PDF is primarily intended as an archive format, Acrobat users often want to take passages of text in a PDF and reuse them in a word processor or in email. For example, you may wish to cite part of an important court decision in a brief.
One frustration is that text copied from a PDF may have hard line endings. Depending on how the PDF was created, each line may have a paragraph return at the end.
Text formatted with hard line endings doesn’t reflow properly and can take a lot of time to clean up:
Hard Line Endings
THIS AGREEMENT WITNESSETH that in consideration of the premises and mutualcovenants and agreements hereinafter contained, and for other good and valuableconsideration (the receipt and sufficiency of which is hereby acknowledged by the parties
Text that Reflows
THIS AGREEMENT WITNESSETH that in consideration of the premises and mutual covenants and agreements hereinafter contained, and for other good and valuable consideration (the receipt and sufficiency of which is hereby acknowledged by the parties hereto), it is agreed by and between the parties hereto as follows:
One workaround is to use Acrobat Standard or Professional which can save PDFs to editable formats such as text files, rich text files and Word. In doing so, the hard line endings are generally eliminated.
Acrobat 8 simplifies the process somewhat by offering a new Export button in the Acrobat toolbar:
           
          
However, this workflow means saving out the file, finding the correct passage in the Word file, then copying that to your working document. That’s a lot work if you just want to copy a paragraph or two!
Fortunately, there is an easy way to eliminate hard line endings when copying text from a PDF.
Read on to learn how…
h3>Accessible Files Reflow
Tagged or accessible PDFs have structure that allows screen reading software used by the visually impaired to properly traverse complex documents.
One benefit for every Acrobat user is that tagged PDFs also contain information about where paragraphs start and stop.
Unfortunately, a good portion of the PDFs that you receive won’t be tagged properly, but it is easy and painless to add them.
For more background information on tags, read my Understanding Tags article.
Adding Tags
To add tags to PDF, choose Advanced—>Accessibility—>Add Tags to Document…
           
          
Acrobat will add tags to the document and open a Recognition Report which offers useful information about tagging:
           
          
Generally speaking, Acrobat does a pretty good job of adding tags to indicate paragraphs, so I just close the Recognition Report window.
Save your document and your ready to cut and paste clean text that reflows properly!
Further Considerations
If you are scanning to PDF using Acrobat, Acrobat can automatically add tags for cut/paste reflow. Click the Make Accessible option in the Scan window. It’s on by default.
Image-only PDFs—created from a scanner—may only contain a picture of page and won’t have selectable text. Acrobat Standard and Professional can use optical character recognition (OCR) to make text selectable on these pages. Choose Document—>Recognize Text using OCR…
You will still need to add tags to these OCR’d files as outlined in this article.
If you have a lot of PDFs which need tags added, use the Batch Processing feature in Acrobat Professional. You can learn more in my article on Batch OCR.
Creating tagged, accessible documents should be a best practice for everyone.
When you create PDF using the one-button PDF creators (sometimes called PDF Makers) installed by Acrobat into Word, Excel, and PowerPoint, tagging is on by default. Note that printing to the Adobe PDF Print Driver does not create a tagged document. For this reason, I always recommend using the button or menu item if it is available in your application.
Unfortunately, I haven’t found a PDF creation solution from another vendor that creates properly tagged PDFs from common office formats.
I mention this not to disparage other products but to encourage you to ask for this feature from your vendors. That will make PDFs a lot more useful to all of us.