Prev: Scanning suddenly stopped working after Windows update
Next: Cannon Scangear issue - clip organiser
From: isw on 27 Feb 2010 13:48 In article <ip2io5lp3h9b5upfet2426jnbmobh0qrbp(a)4ax.com>, spam(a)kenward.eu wrote: > On Sat, 27 Feb 2010 02:47:04 -0500, Talker <Talker(a)thegood.com> wrote: > > > > > > > I have used Omnipage Pro and Abbyy's Finereader, and neither one > >of them are worth a damn. Maybe it's because I rarely use either one > >of them, and I'm doing something wrong, but every time I have tried to > >scan a document, it never scans and recognizes it properly. I have to > >correct so many errors in the text, that I might as well have typed it > >myself. > > #-----End Quoted (and cut) Message----- > > You have to do a few things right to get decent OCR. > > I manage near on 99% accuracy with OmniPage Pro by scanning at 300dpi > or higher, depending on the material. When I used an older version of Omnipage, conversion to B&W was necessary, but it's been my experience with more recent OCR apps (ABBYY, which came with my Microtek scanner), that a good grayscale image will scan a lot more accurately than one I convert to B&W. I find 300 dpi to work best for most things. I scan a lot of recipes, printed on everything from clay-coated magazine stock to newsprint, and I usually get almost 100% accuracy. The only thing it can't handle is single-glyph "1/2" or "1/4" characters (which show up a lot in ingredient lists). If the source is in color, it often works out better to leave the scanner set for color (although note that this is with an integrated OCR function that only works with scanner input; I actually don't know what the scanner does after I poke the "OCR" button). Basically, I leave the scanner set as I use it for photographs, slap down the page with the recipe on it, do a preview to set the bounds, and hit "OCR". > I also usually go for genuine B&W (not grayscale) to keep file sizes > down. First, I suspect that contemporary OCR routines make use of the gray edges on characters to aid in identifying them; conversion to B&W prevents that from working. Second, why is file size important? For OCR, all files except for the output text file are transient, and the text file is the same size no matter what resolution you use for scanning. You can dump the scanned images after doing the OCR, or in the case where OCR is integrated into the scanner, they're deleted automatically. Plus, with contemporary disk sizes, it would take a lifetime of scans to take up enough room to matter. Isaac
From: nailer on 27 Feb 2010 22:17 On Sat, 27 Feb 2010 12:12:35 +0000, spam(a)kenward.eu wrote: >On Sat, 27 Feb 2010 02:47:04 -0500, Talker <Talker(a)thegood.com> wrote: > >> >> >> I have used Omnipage Pro and Abbyy's Finereader, and neither one >>of them are worth a damn. Maybe it's because I rarely use either one >>of them, and I'm doing something wrong, but every time I have tried to >>scan a document, it never scans and recognizes it properly. I have to >>correct so many errors in the text, that I might as well have typed it >>myself. > >#-----End Quoted (and cut) Message----- > >You have to do a few things right to get decent OCR. > >I manage near on 99% accuracy with OmniPage Pro by scanning at 300dpi >or higher, depending on the material. > >I also usually go for genuine B&W (not grayscale) to keep file sizes >down. > >Of course, if you are working on exotic things like equations, you may >need to train the software > >MK > +my preference is to scan gray at least 300, if possible more, then run Finereader. The end result is up to you, can be txt, doc or pdf. The size of the final product is not dependant on the scanning setup. And, yes, gray does better work than BW scan.
From: Jethro Pull on 28 Feb 2010 08:37 As to ABBY vs. Omni Page: I have used both and prefer ABBY. Why? ABBY is simpler to use and is accurate enough for me. When I import into MS-Word, misspelled words are underlined and I can fix them.
From: spam on 28 Feb 2010 09:15 On Sat, 27 Feb 2010 10:48:48 -0800, isw <isw(a)witzend.com> wrote: > >Second, why is file size important? For OCR, all files except for the >output text file are transient, and the text file is the same size no >matter what resolution you use for scanning. You can dump the scanned >images after doing the OCR, or in the case where OCR is integrated into >the scanner, they're deleted automatically. Plus, with contemporary disk >sizes, it would take a lifetime of scans to take up enough room to >matter. > >Isaac -----End Quoted (and cut) Message----- File size is important, albeit less than it was, because I retain files at "searchable" PDF files. This is a PDF image with a text overlay. While HD space is, indeed, inexpensive, that is not the only place where people store files. What about on a CD/DVD? Or on a laptop? Sadly, disks cost more there. You are also limited in how many you can carry at one time. Keep file sizes down and you can put a lot on a USB stick. I use PDF searchable, created in PaperPort with the help of OmniPage, because I need to see the original format. I store newspaper cuttings, old paper press releases and reports. (I have more than 10,000 cuttings alone.) A text file alone is not good enough. B&W is also there because that it the source material. It adds nothing to scan a newspaper cutting in greyscale. B&W works fine for me. My scanner has no problems, and it has to eat "pink pages" from the Financial Times. Feel free to do whatever suits you. MK
From: Edward Kroeze on 28 Feb 2010 11:36
"CSM1" <nomail(a)nomoremail.com> wrote in message news:Xns9D2C78F9917E5nomoremail(a)74.209.136.91... [snip] > > No OCR is 100% accurate. 90% is doable. > Just curious what exactly you mean by 90% ? 1) 90% of the pages are flawless? (and 10% contain one or more mistakes?) 2) 90% of each line is correct? (and 10% (appr. 5 to 8 characters) PER LINE are wrong?) 3) something different? The first option might be workable, but the second for sure is not. Cheers, Edward |