Which OCR package for text scanning is the best? [Scanners]

Prev: Scanning suddenly stopped working after Windows update
Next: Cannon Scangear issue - clip organiser

From: isw on 27 Feb 2010 13:48

In article <ip2io5lp3h9b5upfet2426jnbmobh0qrbp(a)4ax.com>,
spam(a)kenward.eu wrote:

> On Sat, 27 Feb 2010 02:47:04 -0500, Talker <Talker(a)thegood.com> wrote:
>
> >
> >
> > I have used Omnipage Pro and Abbyy's Finereader, and neither one
> >of them are worth a damn. Maybe it's because I rarely use either one
> >of them, and I'm doing something wrong, but every time I have tried to
> >scan a document, it never scans and recognizes it properly. I have to
> >correct so many errors in the text, that I might as well have typed it
> >myself.
>
> #-----End Quoted (and cut) Message-----
>
> You have to do a few things right to get decent OCR.
>
> I manage near on 99% accuracy with OmniPage Pro by scanning at 300dpi
> or higher, depending on the material.

When I used an older version of Omnipage, conversion to B&W was
necessary, but it's been my experience with more recent OCR apps (ABBYY,
which came with my Microtek scanner), that a good grayscale image will
scan a lot more accurately than one I convert to B&W. I find 300 dpi to
work best for most things. I scan a lot of recipes, printed on
everything from clay-coated magazine stock to newsprint, and I usually
get almost 100% accuracy. The only thing it can't handle is single-glyph
"1/2" or "1/4" characters (which show up a lot in ingredient lists).

If the source is in color, it often works out better to leave the
scanner set for color (although note that this is with an integrated OCR
function that only works with scanner input; I actually don't know what
the scanner does after I poke the "OCR" button).

Basically, I leave the scanner set as I use it for photographs, slap
down the page with the recipe on it, do a preview to set the bounds, and
hit "OCR".

> I also usually go for genuine B&W (not grayscale) to keep file sizes
> down.

First, I suspect that contemporary OCR routines make use of the gray
edges on characters to aid in identifying them; conversion to B&W
prevents that from working.

Second, why is file size important? For OCR, all files except for the
output text file are transient, and the text file is the same size no
matter what resolution you use for scanning. You can dump the scanned
images after doing the OCR, or in the case where OCR is integrated into
the scanner, they're deleted automatically. Plus, with contemporary disk
sizes, it would take a lifetime of scans to take up enough room to
matter.

Isaac

From: nailer on 27 Feb 2010 22:17

On Sat, 27 Feb 2010 12:12:35 +0000, spam(a)kenward.eu wrote:

>On Sat, 27 Feb 2010 02:47:04 -0500, Talker <Talker(a)thegood.com> wrote:
>
>>
>>
>> I have used Omnipage Pro and Abbyy's Finereader, and neither one
>>of them are worth a damn. Maybe it's because I rarely use either one
>>of them, and I'm doing something wrong, but every time I have tried to
>>scan a document, it never scans and recognizes it properly. I have to
>>correct so many errors in the text, that I might as well have typed it
>>myself.
>
>#-----End Quoted (and cut) Message-----
>
>You have to do a few things right to get decent OCR.
>
>I manage near on 99% accuracy with OmniPage Pro by scanning at 300dpi
>or higher, depending on the material.
>
>I also usually go for genuine B&W (not grayscale) to keep file sizes
>down.
>
>Of course, if you are working on exotic things like equations, you may
>need to train the software
>
>MK
>

+my preference is to scan gray at least 300, if possible more, then
run Finereader. The end result is up to you, can be txt, doc or pdf.
The size of the final product is not dependant on the scanning setup.
And, yes, gray does better work than BW scan.

From: Jethro Pull on 28 Feb 2010 08:37

As to ABBY vs. Omni Page: I have used both and prefer ABBY. Why? ABBY is
simpler to use and is accurate enough for me. When I import into
MS-Word, misspelled words are underlined and I can fix them.

From: spam on 28 Feb 2010 09:15

On Sat, 27 Feb 2010 10:48:48 -0800, isw <isw(a)witzend.com> wrote:

>
>Second, why is file size important? For OCR, all files except for the
>output text file are transient, and the text file is the same size no
>matter what resolution you use for scanning. You can dump the scanned
>images after doing the OCR, or in the case where OCR is integrated into
>the scanner, they're deleted automatically. Plus, with contemporary disk
>sizes, it would take a lifetime of scans to take up enough room to
>matter.
>
>Isaac

-----End Quoted (and cut) Message-----

File size is important, albeit less than it was, because I retain
files at "searchable" PDF files. This is a PDF image with a text
overlay.

While HD space is, indeed, inexpensive, that is not the only place
where people store files. What about on a CD/DVD? Or on a laptop?
Sadly, disks cost more there. You are also limited in how many you can
carry at one time. Keep file sizes down and you can put a lot on a USB
stick.

I use PDF searchable, created in PaperPort with the help of OmniPage,
because I need to see the original format. I store newspaper cuttings,
old paper press releases and reports. (I have more than 10,000
cuttings alone.) A text file alone is not good enough.

B&W is also there because that it the source material. It adds nothing
to scan a newspaper cutting in greyscale.

B&W works fine for me. My scanner has no problems, and it has to eat
"pink pages" from the Financial Times.

Feel free to do whatever suits you.

MK

From: Edward Kroeze on 28 Feb 2010 11:36

"CSM1" <nomail(a)nomoremail.com> wrote in message
news:Xns9D2C78F9917E5nomoremail(a)74.209.136.91...
[snip]
>
> No OCR is 100% accurate. 90% is doable.
>

Just curious what exactly you mean by 90% ?

1) 90% of the pages are flawless? (and 10% contain one or more mistakes?)
2) 90% of each line is correct? (and 10% (appr. 5 to 8 characters) PER LINE
are wrong?)
3) something different?

The first option might be workable, but the second for sure is not.

Cheers,

Edward

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7
Prev: Scanning suddenly stopped working after Windows update
Next: Cannon Scangear issue - clip organiser