Document OCR Software

rockymtnhigh

Hardly Normal
Original poster
Supporting Founder
Apr 14, 2006
30,520
1,161
Normal, IL
Ok guys, looking for some advice here. I have a ton of PDFs with narratives in them that I need to convert to regular text. Many of these are photographic representations of documents that were typed, and they need OCR software to convert.

I use a program called Able2Extract to extract the text from the PDFs, but it does a crap job. I have been looking at OmniPage Professional 17, which looks like a heck of a package, but it is more than $300 even with a educational license, and only seems to offer upgrades for the last two versions of the program.

I am hesitant to drop that amount of cash without knowing how likely it is that the software is going to actually do what I need it to do. So, anyone have experience with it - and advice?
 
Do a search for "free pdfzilla"

It's a freebee pdf to txt converter. I converted the text in a pretty crappy pdf of
an old newspaper page. I was amazed at how well it turned out. They have a
sharware version that will also convert to word etc.
 
Do a search for "free pdfzilla"

It's a freebee pdf to txt converter. I converted the text in a pretty crappy pdf of
an old newspaper page. I was amazed at how well it turned out. They have a
sharware version that will also convert to word etc.

Tried it, and it doesn't work at all. All I get is a blank word document after converting the PDF. Oh well....
 
Have you tried the OpenOffice.org - The Free and Open Productivity Suite ? I have used it to convert word files and scans through my paperport software into pdf files. Best thing it is free:D

I already have PDFs, I need to OCR them to readable text. The PDFs are basically graphics - the police department scanned the original documents into electronic records, but they used a scanner rather than a Print to PDF feature.

Can OpenOffice convert PDF to text? I was not aware it could.
 
I doubt very much that it could. I have heard of some programs that convert PDF files to text, but they seem to disappear fairly quickly. After all, the nature of PDF is that they (generally) don't want you to be able to easily convert to something editable. Of course, some allow for this.

We don't currently use OCR software. Some years ago, we used OmniPage. IIRC it was touted as 98% accurate. Amazing how far short 98% can fall.
 
I think the point that everyone is missing is that the PDFs are scanned (bitmaps) and not the result of printing to a PDF. This is the most insidious and evil kind of PDF as they cannot be readily converted or searched.

The solution is to cough up the money for an OCR package or find a cheap scanner that comes with one.
 
I think the point that everyone is missing is that the PDFs are scanned (bitmaps) and not the result of printing to a PDF. This is the most insidious and evil kind of PDF as they cannot be readily converted or searched.

The solution is to cough up the money for an OCR package or find a cheap scanner that comes with one.

Yes. I have no problem extracting text from a PDF that was electronically-derived; but these files are excactly what you suggest, scanned (bitmaps) and thus OCR is what I need, the ability to detect and read the text that is in the images.

I suspect I will need OmniPage, just hoping to find someone who has a recent version, and can test a conversion of one of my files, because I what I do not want to do is have to spend $300 and have it NOT work. :)
 
BTW i use omnipages little brother - paperport professional 11 it's 199 and it has a ocr product in it, i just converted a pdf to word ok. it looks like the regular 11 ($99) also does ocr, but i cant be certain.

from the manual - Improved OCR Accuracy: PaperPort 11 provides more accurate conversion to text on all scanned and PDF documents including lower resolution images.
 
Success! Not perfect, but far better than anything else I have tried, with the ability to easily customize what you actually convert. I will end up dropping the dough to buy OmniPage Pro 17.
 
Success! Not perfect, but far better than anything else I have tried, with the ability to easily customize what you actually convert. I will end up dropping the dough to buy OmniPage Pro 17.
I just read this thread...OmniPage was the best commercial product around when I was involved with a document management project during the late '90s.
 
I just read this thread...OmniPage was the best commercial product around when I was involved with a document management project during the late '90s.

Yeah, I am not surprised.

I just downloaded their PDF Converter program; its about $200 less, so I am going to see how it compared with the pro version.

Interesting, you can only get the free trials from their UK website, not from the US site.

I also found you can get the full software for a fraction of retail from Amazon. Was surprised a bit by that; I think they are OEM packages, not sure.
 
Their PDF Convert 5 program works extremely well - obviously has the same OCR engine as Omnipage; what it lacks is the ability to select parts of the page NOT to scan, as well as the ability to actually do OCR editing. Still, I ran the same complex document - complete with tables, and such, and it did an identical job.

I was quite pleased.
 

Users Who Are Viewing This Thread (Total: 0, Members: 0, Guests: 0)

Who Read This Thread (Total Members: 1)

Top