Print to Digital: Cleaning Up Your Word File

This week’s post is by Carla Douglas from Beyond Paper Editing.

Adapted. Originally posted at Beyond Paper Editing in August 2014.

If you have a print document that you’d like to self-publish, you can turn it into a digital file and convert it to an ebook.

The first step is to get it into MS Word using OCR software. Note that MS Word is your best friend right now. Editors use Word for a few reasons, and efficient cleanup and editing are high on that list.

Here’s what the file I’m working with looks like as a pdf (produced on a Macintosh Classic and dot matrix printer):

The manuscript has been marked up with pencil, and these marks are picked up by the OCR software, sometimes in unexpected ways. Here’s what the Word file looks like:

Two Kinds of Cleanup

There’s junk in the file—the stuff you can see, and the stuff you can’t. Sometimes, what’s hidden behind the scene in Word is the cause of the junk you can see—things like garbled text and wonky formatting. Also, the pencil marks that haven’t been converted to text remain in the document as pictures, and will have to be deleted. Some random characters appear, too, and the text is all boldface. These are just a few of the things you can see.

To clean up this file, will a spritz of vinegar and water do, or will you need industrial-strength degreaser? The answer depends on what you plan to do with the file next. If you’re going to revise or edit the text, clean it up enough to continue working on it, and save the heavy-duty cleanup for later.

For Initial Cleanup

The story I’m working with here is just over 4,000 words, and it won’t be converted to an ebook any time soon. I’m going to do an initial cleanup using from Jack Lyon’s . (Wiley Publishing  has a free Word add-in with many similar features.)

FileCleaner is about US$30, but there’s a generous 45-day free trial available. It runs as a Word plug-in. Follow the directions on the site to download and install it. It will appear on the Add-ins pane in your Word ribbon. Here’s what it will do (you can select/unselect features):

Running FileCleaner cleaned up most of the junk in my story file—it’s now in a format I can continue to edit without too many distractions. Here’s what it looks like post-FileCleaner:

As you can see, FileCleaner didn’t catch the text that had been marked up with pencil. After trying a few ways to clean this up—including selecting the text and applying Normal style to it—I ended up having to repair it manually by deleting the picture and re-keying the sentence that’s squished together. Because my document is short, this wasn’t a problem, but in a longer document it could present a significant inconvenience. Here’s a last look at the cleaned-up text:

Other Cleanup Tips

At times, Word can be frustrating to work in—with extra page breaks and hidden formatting, it will do things you don’t want it to. For now, I’ve cleaned my file up well enough to do further editing.

If your Word document is really acting up, there are a few of things to try. I’ve found that the best place to start is by using the show/hide feature on the Word ribbon.  How to Find the Hidden Formatting That Will Mess Up Your Ebook, shows you how.

Image by atomicjeep