Copy text from images

All the elements were there. Algorithms freely available; implementations ready to download, for free. It just took one clever person to FINALLY make it: a seamless way to copy text from images. At least in Chrome, for now, but since it is Open Source, I guess it is just a matter of time and we will see it as a build-in feature in Firefox and other browsers. For now, use the Project Naphta Chrome browser extension. The main project page is also an interesting read.

Words on the web exist in two forms: there’s the text of articles, emails, tweets, chats and blogs— which can be copied, searched, translated, edited and selected— and then there’s the text which is shackled to images, found in comics, document scans, photographs, posters, charts, diagrams, screenshots and memes. Interaction with this second type of text has always been a second class experience, the only way to search or copy a sentence from an image would be to do as the ancient monks did, manually transcribing regions of interest.

(from the Naphta web site)

It works really nice, assuming that you select the “English tesseract” option from the “Languages” sub-menu — the off-line javascript implementation of the OCRad is not very effective, in contrast to the cloud-based tesseract service.

Advertisements