From: Nam Quang Tran on
Hello again,

In response to the various comments about how DocFetcher fails to
index certain files:

1) DocFetcher has a small built-in debugging tool called "Parser
Testbox", which you can open by pressing F11. This lets you perform
text extraction on single files, so you can see exactly what is
extracted from a particular file.

2.1) Problems with PDF files: Some PDF files have a "can't extract
text" permission flag. If this flag is set, you need the master
password for the PDF file in order to extract text. DocFetcher does
not support PDF passwords and decryption. (And you probably don't have
the master password anyway.)

2.2) We, the DocFetcher developers, are not directly involved in the
development of the various text extraction libraries that are used in
DocFetcher. For PDF indexing, a library called PDFBox is used, which
is a rock solid piece of software used by many other Java
applications. If this library fails, then it's usually because (a) the
PDF file is encrypted, or (b) the PDF consists of scanned images
without any real text.

3) The current version of DocFetcher (1.0.1) does not search in
filenames. We might include this in a future release.

4) For more information on wildcards, have a look at the "Query
syntax" section in the manual.

Best regards
q:-) <= qforce
From: Nam Quang Tran on
By the way, there's an easy way to find out if the "can't extract
text" flag in a PDF file is set: If you can't copy text from the PDF
file to the clipboard using a standard PDF reader such as Adobe or
Foxit Reader, then this flag is probably set.

On Jan 15, 5:39 pm, Nam Quang Tran <qforce....(a)googlemail.com> wrote:
> Hello again,
>
> In response to the various comments about howDocFetcherfails to
> index certain files:
>
> 1)DocFetcherhas a small built-in debugging tool called "Parser
> Testbox", which you can open by pressing F11. This lets you perform
> text extraction on single files, so you can see exactly what is
> extracted from a particular file.
>
> 2.1) Problems with PDF files: Some PDF files have a "can't extract
> text" permission flag. If this flag is set, you need the master
> password for the PDF file in order to extract text.DocFetcherdoes
> not support PDF passwords and decryption. (And you probably don't have
> the master password anyway.)
>
> 2.2) We, theDocFetcherdevelopers, are not directly involved in the
> development of the various text extraction libraries that are used inDocFetcher. For PDF indexing, a library called PDFBox is used, which
> is a rock solid piece of software used by many other Java
> applications. If this library fails, then it's usually because (a) the
> PDF file is encrypted, or (b) the PDF consists of scanned images
> without any real text.
>
> 3) The current version ofDocFetcher(1.0.1) does not search in
> filenames. We might include this in a future release.
>
> 4) For more information on wildcards, have a look at the "Query
> syntax" section in the manual.
>
> Best regards
> q:-) <= qforce

From: mike on
Nam Quang Tran wrote:
> By the way, there's an easy way to find out if the "can't extract
> text" flag in a PDF file is set: If you can't copy text from the PDF
> file to the clipboard using a standard PDF reader such as Adobe or
> Foxit Reader, then this flag is probably set.
>
> On Jan 15, 5:39 pm, Nam Quang Tran <qforce....(a)googlemail.com> wrote:
>> Hello again,
>>
>> In response to the various comments about howDocFetcherfails to
>> index certain files:
>>
>> 1)DocFetcherhas a small built-in debugging tool called "Parser
>> Testbox", which you can open by pressing F11. This lets you perform
>> text extraction on single files, so you can see exactly what is
>> extracted from a particular file.
>>
>> 2.1) Problems with PDF files: Some PDF files have a "can't extract
>> text" permission flag. If this flag is set, you need the master
>> password for the PDF file in order to extract text.DocFetcherdoes
>> not support PDF passwords and decryption. (And you probably don't have
>> the master password anyway.)
>>
>> 2.2) We, theDocFetcherdevelopers, are not directly involved in the
>> development of the various text extraction libraries that are used inDocFetcher. For PDF indexing, a library called PDFBox is used, which
>> is a rock solid piece of software used by many other Java
>> applications. If this library fails, then it's usually because (a) the
>> PDF file is encrypted, or (b) the PDF consists of scanned images
>> without any real text.
>>
>> 3) The current version ofDocFetcher(1.0.1) does not search in
>> filenames. We might include this in a future release.
>>
>> 4) For more information on wildcards, have a look at the "Query
>> syntax" section in the manual.
>>
>> Best regards
>> q:-) <= qforce
>
IF you send me a direct email, my email address is valid,
with a preferred email address on your end, I can attach a 160KByte
pdf file that I have not been able to index. But I can open it in
foxit reader and cut text out of it.

mike
From: Nam Quang Tran on
> IF you send me a direct email, my email address is valid,
> with a preferred email address on your end, I can attach a 160KByte
> pdf file that I have not been able to index. But I can open it in
> foxit reader and cut text out of it.
>
> mike

I'm not familiar with Google Groups, so could you just send it to my
official developer address?
users.sourceforge.net <- qforce@
From: Howldog on
On Fri, 15 Jan 2010 11:09:00 -0800 (PST), Nam Quang Tran wrote:

>> IF you send me a direct email, my email address is valid,
>> with a preferred email address on your end, I can attach a 160KByte
>> pdf file that I have not been able to index. But I can open it in
>> foxit reader and cut text out of it.
>>
>> mike
>
> I'm not familiar with Google Groups, so could you just send it to my
> official developer address?
> users.sourceforge.net <- qforce@

I remember you, you were the one who took orders from that Italian slob
Saladini. Did you trick the Jewtalians into forking over their dough to
Ho Ho Ho Chi Minh?