This Free Tool Can Help You Search and Copy (Nearly) Any PDF

13.02.2025 00:00

Lifehacker.com

There's nothing worse than opening a PDF and realizing you can't use the search function or even highlight text. This typically happens when a PDF was created by scanning a paper document—it's just a series of images. Most modern scanning software uses Optical Character Recognition (OCR) so that words are both searchable and selectable but sometimes you'll run into documents where this didn't happen.

In those cases, the free and open source OCRmyPDF is perfect to have around. This is a command line application that quickly converts any PDF file into a PDF/A file complete with optical character recognition, meaning you'll be able to search the text. Even better, it's completely free.

Installing the application is best done using your package manager on Linux devices and using Homebrew on Mac. Windows users can technically install the application by installing Python and a few other dependencies—look into that if you're willing to do some digging.

Once the application is set up, you can use it by typing ocrmypdf followed by the name of the document you want to add OCR to, and then the name of the document you'd like to create. So, for example, ocrmypdf before.pdf after.pdf would take "before.pdf", add character recognition, then create a new document called "after.pdf".

The process will take awhile, depending on the size of the document, and it might not be entirely accurate if the image quality is low. Even saying all that, though, I found this did a pretty good job even with the most ancient and poorly compressed PDFs I could dig up.

Credit: Justin Pot

And there's more you can do here: In fact, the Cookbook on the OCRmyPDF documentation outlines a bunch of things you could do. You can compress the images in the PDF, for example, by adding --pdfa-image-compression jpeg to your commend. You can automatically re-orient any pages with sideways text by adding --rotate-pages to the command. Or maybe the PDF you're processing already has OCR that you think is poor quality—you can add --redo-ocr to the command; this will strip out existing OCR information and start over.

You get the idea: There's a lot here. Check out the documentation for more information because there's more this thing can do.

Moscow.media

Частные объявления сегодня

Rss.plus

Все новости за 24 часа

This Free Tool Can Help You Search and Copy (Nearly) Any PDF

Новости спорта

Александрова победила Мертенс и вышла в 1/4 финала турнира WTA в Дохе

«В Кремле сегодня пьют водку, это большой день для Москвы», — экс-советник Трампа Болтон (ВИДЕО)

Отделение СФР по Москве и Московской области проактивно открыло свыше 178 тысяч СНИЛС новорожденным

Ефимов: в Москве в 2024 году построили более 1 млн квадратных метров деловой недвижимости

Сформирован обновленный состав жюри Архитектурной премии Москвы 2025