From: Why Tea on
On Jan 15, 4:49 pm, Mike Mills <effici...(a)canada.com> wrote:
> Why Tea <ytl...(a)gmail.com> wrote in news:61cb8789-7285-4c7a-826d-
> e41801d6a...(a)21g2000yqj.googlegroups.com:
>
> >http://docfetcher.sourceforge.net/en/index.html
>
> I tested DocFetcher briefly using a part of a bird project I am
> working on.
>
> <!--runs on java , starts slowly but works quickly.-->
> This installation chosen is "portable"
> The indexes can be created and then later burned to cd.
>
> <!--free-->
>
> <!--highlighted found terms-->
>
> Only shows highlighted found terms when using text type html display
> without pictures, this is the quick display, the other display type
> shows pictures, but does not highlight found terms.  [click on button
> top of bottom window to toggle].
>
> Indexes ~222 species [>1200 files] & creates ~1Mb index fairly
> quickly, slows down a little while indexing .pdf files.
>
> Searches in .doc .docx .htm .html .xls .txt .info etc. files as
> selected.  
> directories to search and index are individually selected.
> Indexing a *whole* drive is not encouraged. This is the opposite of
> the Google Desktop Search approach.
>
> I intend to experiment further with this application, it seems
> worthwhile. I usually use indexyourfiles to create an index and to
> later burn the index to a dvd where it will run. This is nearly
> always adequate for my needs. This docFetcher will apparently do more
> which I may find useful. It is slower than IndexYourFiles because it
> indexes every word rather than just the titles, of course the indexes
> are *much* larger than the filename indexes in IndexYourFiles.
>
> http://www.indexyourfiles.com

Mike, thanks for the info. I am just wondering how
accurate is DocFetcher's searching. In one case,
putting a wildcard yielded nothing - while many files
were found without the wildcard. But what it found
was still fewer than Swish-e found. I would like to
know if others have seen the same problem. It's
critical not to miss anything if DocFetcher is to be
used to index 10's of thousands of knowledgebase
files.
From: mike on
Why Tea wrote:
> On Jan 15, 4:49 pm, Mike Mills <effici...(a)canada.com> wrote:
>> Why Tea <ytl...(a)gmail.com> wrote in news:61cb8789-7285-4c7a-826d-
>> e41801d6a...(a)21g2000yqj.googlegroups.com:
>>
>>> http://docfetcher.sourceforge.net/en/index.html
>> I tested DocFetcher briefly using a part of a bird project I am
>> working on.
>>
>> <!--runs on java , starts slowly but works quickly.-->
>> This installation chosen is "portable"
>> The indexes can be created and then later burned to cd.
>>
>> <!--free-->
>>
>> <!--highlighted found terms-->
>>
>> Only shows highlighted found terms when using text type html display
>> without pictures, this is the quick display, the other display type
>> shows pictures, but does not highlight found terms. [click on button
>> top of bottom window to toggle].
>>
>> Indexes ~222 species [>1200 files] & creates ~1Mb index fairly
>> quickly, slows down a little while indexing .pdf files.
>>
>> Searches in .doc .docx .htm .html .xls .txt .info etc. files as
>> selected.
>> directories to search and index are individually selected.
>> Indexing a *whole* drive is not encouraged. This is the opposite of
>> the Google Desktop Search approach.
>>
>> I intend to experiment further with this application, it seems
>> worthwhile. I usually use indexyourfiles to create an index and to
>> later burn the index to a dvd where it will run. This is nearly
>> always adequate for my needs. This docFetcher will apparently do more
>> which I may find useful. It is slower than IndexYourFiles because it
>> indexes every word rather than just the titles, of course the indexes
>> are *much* larger than the filename indexes in IndexYourFiles.
>>
>> http://www.indexyourfiles.com
>
> Mike, thanks for the info. I am just wondering how
> accurate is DocFetcher's searching. In one case,
> putting a wildcard yielded nothing - while many files
> were found without the wildcard. But what it found
> was still fewer than Swish-e found. I would like to
> know if others have seen the same problem. It's
> critical not to miss anything if DocFetcher is to be
> used to index 10's of thousands of knowledgebase
> files.

The plot thickens.
I did a test index on about a gigabyte of files.
Got about a hundred pdf files and one MSExcel file
that failed to index.

A typical problem pdf file was a Tektronix user manual.
I was gonna post a link to the manual.
My archived version of the manual was in PDF V1.4.
I just downloaded it again. It appears that TEK
has downgraded their manuals to PDF V1.2.

They appear to produce identical results in Foxit reader,
but if you look at each with a text viewer, they are
Significantly different...not even close to the same.

I have no direct evidence that this is the source of the
problem. It's just a data point.

I was thinking about reinstalling docfetcher to see if
it could index the V1.2 pdf file, but decided
it doesn't matter. My archive is carved in plastic.
An indexer has to work with ALL versions to be a
candidate.

FWIW, I typically rename downloaded files using keywords
I recognize. Since docfetcher does not appear to index
the filenames, if it fails on the pdf, there's
no indication in the database that the file exists.

I like docfetcher a lot. It lets me index directories
individually without having to re-index everything else.
I just can't get it to index everything in the directory.
From: Craig on
On 01/15/2010 05:44 AM, mike wrote:

I forwarded this thread to the developers' email, inviting them to
comment.

fyi,
--
-Craig
From: Nam Quang Tran on
Hello,

I'm the project admin of DocFetcher. Ask me anything :)

Best regards
q:-) <= qforce

On Jan 15, 4:49 pm, Craig <netburg...(a)REMOVEgmail.com> wrote:
> On 01/15/2010 05:44 AM, mike wrote:
>
> I forwarded this thread to the developers' email, inviting them to
> comment.
>
> fyi,
> --
> -Craig
From: Craig on
On 01/15/2010 08:15 AM, Nam Quang Tran wrote:

> On Jan 15, 4:49 pm, Craig<netburg...(a)REMOVEgmail.com> wrote:
>> On 01/15/2010 05:44 AM, mike wrote:
>>
>> I forwarded this thread to the developers' email, inviting them to
>> comment.
>>
> Hello,
>
> I'm the project admin of DocFetcher. Ask me anything :)
>

Hello & thanks for dropping by.

We had some comments earlier in this thread from a couple of people who
are evaluating DocFetcher. Essentially, it looks like DocFetcher may
not be indexing every file in the targeted directory(ies):

>> I am just wondering how
>> accurate is DocFetcher's searching. In one case,
>> putting a wildcard yielded nothing - while many files
>> were found without the wildcard. But what it found
>> was still fewer than Swish-e found. I would like to
>> know if others have seen the same problem. It's
>> critical not to miss anything if DocFetcher is to be
>> used to index 10's of thousands of knowledgebase
>> files.
>
> The plot thickens.
> I did a test index on about a gigabyte of files.
> Got about a hundred pdf files and one MSExcel file
> that failed to index.
>
> A typical problem pdf file was a Tektronix user manual.
> I was gonna post a link to the manual.
> My archived version of the manual was in PDF V1.4.
> I just downloaded it again. It appears that TEK
> has downgraded their manuals to PDF V1.2.

Are you aware of any issues like this?

--
-Craig