From: John G. on

"RF" <RF(a)NoDen.con> wrote in message
news:7hfkviF2u28tpU1(a)mid.individual.net...
> Craig wrote:
>> On 09/16/2009 10:22 AM, RF wrote:
>>> Hi Experts :-)
>>>
>>> I have a PDF version of a language dictionary that has two different
>>> fonts.
>>>
>>> It would be much faster to access the information if I had it in the
>>> electronic form - just type in a word and get the info related to that
>>> word.
>>>
>>> Apart from a smart 2-font OCR, what else would I need?
>>>
>>> Suggestions greatly appreciated.
>>
>> Gosh. That could be a rat's-nest of a project... Listen, before
>> attempting OCR, how about cut 'n paste? I mean, unless the doc itself is
>> an image, the text is still accessible.
>>
>> Interesting... Please post /the/ solution when you find it.
>>
>> thx,
>
> No progress yet. The book has a few hundred pages so cutting and pasting
> is likely to be very tedious. The doc is a pdf, so it is an image but I
> have heard of PDF readers that had an OCR facility.

1. Why do you think PDF files are images?

2. Omnipage 16 will accept PDF as input for OCR conversion.
I know it is not free ware but it should work and maybe there is free
equivalent.
Anyway OCR programs accept image files.

John G.


From: RF on
John G. wrote:
> "RF" <RF(a)NoDen.con> wrote in message
> news:7hfkviF2u28tpU1(a)mid.individual.net...
>> Craig wrote:
>>> On 09/16/2009 10:22 AM, RF wrote:
>>>> Hi Experts :-)
>>>>
>>>> I have a PDF version of a language dictionary that has two different
>>>> fonts.
>>>>
>>>> It would be much faster to access the information if I had it in the
>>>> electronic form - just type in a word and get the info related to that
>>>> word.
>>>>
>>>> Apart from a smart 2-font OCR, what else would I need?
>>>>
>>>> Suggestions greatly appreciated.
>>> Gosh. That could be a rat's-nest of a project... Listen, before
>>> attempting OCR, how about cut 'n paste? I mean, unless the doc itself is
>>> an image, the text is still accessible.
>>>
>>> Interesting... Please post /the/ solution when you find it.
>>>
>>> thx,
>> No progress yet. The book has a few hundred pages so cutting and pasting
>> is likely to be very tedious. The doc is a pdf, so it is an image but I
>> have heard of PDF readers that had an OCR facility.
>
> 1. Why do you think PDF files are images?
>
> 2. Omnipage 16 will accept PDF as input for OCR conversion.
> I know it is not free ware but it should work and maybe there is free
> equivalent.
> Anyway OCR programs accept image files.
>
> John G.

Some PDF progs are "writable" and others not. It seems to me that the
latter uses images. The prog I have seems to be unwritable, so I can't
copy the individual words.
From: John G. on

"RF" <RF(a)NoDen.con> wrote in message
news:7hkl9uF2tq9sdU1(a)mid.individual.net...
> John G. wrote:
>> "RF" <RF(a)NoDen.con> wrote in message
>> news:7hfkviF2u28tpU1(a)mid.individual.net...
>>> Craig wrote:
>>>> On 09/16/2009 10:22 AM, RF wrote:
>>>>> Hi Experts :-)
>>>>>
>>>>> I have a PDF version of a language dictionary that has two different
>>>>> fonts.
>>>>>
>>>>> It would be much faster to access the information if I had it in the
>>>>> electronic form - just type in a word and get the info related to that
>>>>> word.
>>>>>
>>>>> Apart from a smart 2-font OCR, what else would I need?
>>>>>
>>>>> Suggestions greatly appreciated.
>>>> Gosh. That could be a rat's-nest of a project... Listen, before
>>>> attempting OCR, how about cut 'n paste? I mean, unless the doc itself
>>>> is an image, the text is still accessible.
>>>>
>>>> Interesting... Please post /the/ solution when you find it.
>>>>
>>>> thx,
>>> No progress yet. The book has a few hundred pages so cutting and pasting
>>> is likely to be very tedious. The doc is a pdf, so it is an image but I
>>> have heard of PDF readers that had an OCR facility.
>>
>> 1. Why do you think PDF files are images?
>>
>> 2. Omnipage 16 will accept PDF as input for OCR conversion.
>> I know it is not free ware but it should work and maybe there is free
>> equivalent.
>> Anyway OCR programs accept image files.
>>
>> John G.
>
> Some PDF progs are "writable" and others not. It seems to me that the
> latter uses images. The prog I have seems to be unwritable, so I can't
> copy the individual words.

But anyway Omnipage normally converts from images it gets from a scanner.
Do you have alink to the file so Icould look at it?

John G.


From: RF on
John G. wrote:
> "RF" <RF(a)NoDen.con> wrote in message
> news:7hkl9uF2tq9sdU1(a)mid.individual.net...
>> John G. wrote:
>>> "RF" <RF(a)NoDen.con> wrote in message
>>> news:7hfkviF2u28tpU1(a)mid.individual.net...
>>>> Craig wrote:
>>>>> On 09/16/2009 10:22 AM, RF wrote:
>>>>>> Hi Experts :-)
>>>>>>
>>>>>> I have a PDF version of a language dictionary that has two different
>>>>>> fonts.
>>>>>>
>>>>>> It would be much faster to access the information if I had it in the
>>>>>> electronic form - just type in a word and get the info related to that
>>>>>> word.
>>>>>>
>>>>>> Apart from a smart 2-font OCR, what else would I need?
>>>>>>
>>>>>> Suggestions greatly appreciated.
>>>>> Gosh. That could be a rat's-nest of a project... Listen, before
>>>>> attempting OCR, how about cut 'n paste? I mean, unless the doc itself
>>>>> is an image, the text is still accessible.
>>>>>
>>>>> Interesting... Please post /the/ solution when you find it.
>>>>>
>>>>> thx,
>>>> No progress yet. The book has a few hundred pages so cutting and pasting
>>>> is likely to be very tedious. The doc is a pdf, so it is an image but I
>>>> have heard of PDF readers that had an OCR facility.
>>> 1. Why do you think PDF files are images?
>>>
>>> 2. Omnipage 16 will accept PDF as input for OCR conversion.
>>> I know it is not free ware but it should work and maybe there is free
>>> equivalent.
>>> Anyway OCR programs accept image files.
>>>
>>> John G.
>> Some PDF progs are "writable" and others not. It seems to me that the
>> latter uses images. The prog I have seems to be unwritable, so I can't
>> copy the individual words.
>
> But anyway Omnipage normally converts from images it gets from a scanner.
> Do you have alink to the file so Icould look at it?
>
> John G.

I don't have time to investigate this now but, if your curiosity is
sharpened, I suggest you look for Focloir Gaeilge agus Bearla
(Irish-English Dictionary by Patrick S Dineen, published 1904.) It was
scanned by Google and is in PDF format. It is available somewhere on the
'net and downloads are free. A copy and paste works for the English
words but the Gaelic words are left behind, most likely because of the
different font from the English. There may be some kind of dual OCR
around. Hope that gets you out hunting. Should keep you out of trouble
for a bit ;-) Good luck!
From: Johnw on
RF expressed precisely :

> I don't have time to investigate this now but, if your curiosity is
> sharpened, I suggest you look for Focloir Gaeilge agus Bearla (Irish-English
> Dictionary by Patrick S Dineen, published 1904.) It was scanned by Google and
> is in PDF format. It is available somewhere on the 'net and downloads are
> free. A copy and paste works for the English words but the Gaelic words are
> left behind, most likely because of the different font from the English.
> There may be some kind of dual OCR around. Hope that gets you out hunting.
> Should keep you out of trouble for a bit ;-) Good luck!

Maybe it's your operating system. These are extracts from the page.

WGL Pan-European Character Set
http://www.ascendercorp.com/fonts/multilingual/wgl/

Using WGL fonts in Windows or Mac OS X

Windows XP, Windows Vista, Windows 7 and Mac OS X includes text
services that allow you to enter and edit text in a document.

You define a default language and keyboard layout when you install
Windows, but you can add additional text services from Control
Panel/Regional and Language Options.

If you select a keyboard layouts from one of the languages in a WGL
font, this will provide you with the ability to easily type in text in
the desired language. Keyboards are generally language-specific, and
some languages provide multiple keyboard layout options. Note: each
text service requires computer memory and can affect performance, so
only add the languages you need.

Please go to our Font Help section for more information on text
services and keyboard layouts in Windows XP, Windows Vista and Mac OS
X.
http://www.ascendercorp.com/support/input/