GitHub - dpinney/ocrmypdfmac: Free Mac OCR for PDFs

Making PDFs Searchable

Tesseract is a great open source library for doing optical character recognition (OCR). But it's a little tricky to use it to make a PDF of images searchable, probably the biggest use case for OCR. Here's how to do that on a Mac.

(I tested all this and it worked as of 9 June 2015 on Mac OS X Yosemite 10.10.3. Your mileage may vary.)

Steps for the Command Line

Install Homebrew or update your copy to the latest version (brew update).
Get ghostscript: brew install gs
Make sure you get the latest version of tesseract: brew install --devel tesseract
Now, let's say you scanned a magazine to input.pdf. Make a tiff first: gs -sDEVICE=tiff32nc -r300 -o mag.tif input.pdf
Then OCR it: tesseract mag.tif output pdf

Then open the resulting output.pdf in Preview.app and start searching for some words. They should highlight in the same location they were in the images. Tada!

Shell Script Application

Or, just use .app in this repo by downloading here. You still need to follow the homebrew steps above to get your copy of gs and tesseract. The source .sh is also attached. I used Platypus to turn the shell script in to an app.

Thanks

I was inspired by ryanfb's instructions and this discussion on stackflow for how to get ghostscript to give pretty output. And I used Platypus.app, which is awesome.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
ocrmypdfgui.app/Contents		ocrmypdfgui.app/Contents
README.md		README.md
aborted applescript attempt - OcrMyPdf.scpt		aborted applescript attempt - OcrMyPdf.scpt
ocrmypdf.md		ocrmypdf.md
ocrmypdf.sh		ocrmypdf.sh
testingInput.pdf		testingInput.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Making PDFs Searchable

Steps for the Command Line

Shell Script Application

Thanks

About

Releases

Packages

Languages

dpinney/ocrmypdfmac

Folders and files

Latest commit

History

Repository files navigation

Making PDFs Searchable

Steps for the Command Line

Shell Script Application

Thanks

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages