DocFetcher - java-based desktop searcher [Freeware]

Prev: [webapp] Storytlr: a lifestreamer
Next: What Good is a Portable App that Won't Run?

From: mike on 18 Jan 2010 11:06

Nam Quang Tran wrote:
> On Jan 18, 4:44 am, mike <spam...(a)go.com> wrote:
>> I spent the afternoon figuring out how to recursively list the contents
>> of zip files inside zip files with VB6. Was pretty easy, 'cause I just
>> copied what someone else already figured out, but I've still got all manner
>> of issues with file permissions and error recovery.
>
> Throw VB6 away and start learning Python. Just my 2 cents.

Thats' just silly.
Scenario 1: click the vb icon. Type in a dozen commands. Click compile.

Scenario 2: Learn Python.....
I ain't learnin' another language until the existing one
can't be coerced into solving my problem.

>
>
>> Well, I've already split it into 50 DVD's ;-)
>>
>> I got very annoyed during the testing because I had to navigate to the
>> test directory every time. Maybe the create archive process could remember
>> the last place you created...the .ini file is already there.
>> I didn't try drag/drop. Maybe that solves the problem.
>
> Good idea. As I said, you're one hell of a beta-tester.
>
> Btw, have you ever thought about moving your DVD archive to an
> external USB storage device? These days you can buy some real good
> ones with 1 TB storage capacity or more. Since DocFetcher (amongst
> others) is portable, you could put it on the USB drive as well, which
> would solve the mounting/unmounting issue altogether.

Already got that. Just leave it turned off most of the time to
save wear and tear. Also insulates the data from external malware
and internal stupidity...I seem to have an infinite supply of
internal stupidity.

This isn't about how I manage arhives. It's about helping you
make the BESTEST indexing program in the world.

Beta 3
Leading wildcard in search seems to work now.

I clicked the search scope boxes to index .exe and .zip files.
Big files give the error:
Not enough memory left in the java virtual machine...
Guess that's expected...

I really don't want a text search inside exe files.
Don't want the bloat. Serves no purpose. And indexing takes forever.
I'm about 700 indexed files into it and the index directory is already
over 1.2GB. That ain't gonna do.

OOPS, discovered another issue.
When you abort an indexing operation, it claims it will delete the
index. And it empties the search scope box...but the indexes directory
is still 1.1GB.

Recursively continuing the index inside .zip and .exe(that are zip
archives) would be interesting.

At this point, I just want the filename added to the index so I can use
one tool for all my file indexing needs.

Adding a single checkbox, "include all filenames in index"

would go a long way toward meeting my needs.

If I could check that box and uncheck all the others, I'd get a small
searchable filename index. Lots of flexibility.

From: surprise on 18 Jan 2010 16:35

1.79 Gigabytes of documents mostly pdf and text searching for the
term
"food drive" with the quotation marks

DOC FETCHER

Found 1 document 1040.txt
With in the document the results were 24 occurrences of which
4 were "food" with out quotes
12 were "drive" with out quotes
8 were "food drive" with out the quotes or 4 times "food drive" was
found together.

If "food drive" was entered with out the quotes 50 matching documents
were found. None of the documents other than 1040.txt of this group
seems to have an instance of food drive.

1) ARE THE INSTANCES OF "FOOD" AND "DRIVE" THAT ARE NOT TOGETHER SOME
TYPE OF FUZZY SEARCH? IN SOME INSTANCES THIS FEATURE OF THE PROGRAM
MAY BE BENEFICIAL. HOWEVER, WHEN SEARCHING THROUGH MANY DOCUMENTS IT
WILL BE NICE TO BE ABLE TO TURN THIS FEATURE OFF.

2) THE PROGRAM SEEMS TO BE WEAK AT SEARCHING THROUGH PDF FILES. I AM
LED TO BELIEVE THIS WILL BE CORRECTED SOON.

WILMA

"food drive" with the quotation marks
Found 5 documents
4 pdf
1 1040.txt
4 times "food drive" was found together in 1040.txt.

WILMA WAS USING INTERNAL FILTERS. IF PDFTOTXT IS USED, AS EXTERNAL
FILTER MISSING PDF FILE COULD POSSIBLY BE SEARCHED. WILMA ALSO MAY NO
LONGER BE PORTABLE IF PDFTOTEXT IS USED.)

LUCERNE DESKTOP SEARCH

6 documents listed
5 pdf
1 1040.txt
Lucerne Desktop Search can't see inside documents

INFO RAPID SEARCH AND REPLACE

set to match words entered. does not need quotation marks to do this
search.
6 files found
5 pdf
1 1040.txt
5 instances of food drive found in 1040.txt
(One instance was actually "food drives" this was not found in the
other searches.

COPERNIC DESKTOP SEARCH

6 files found
5 are pdf files
1 1040.txt
4 instances of food drive found in 1040.txt

Search inform Free

looking for "food drive" again this program does not use quotes.

6 files found with "food drive"
5 files are pdf
1 1040.txt
4 "food drives" found in 1040.txt

From: mike on 18 Jan 2010 18:23

I had a little surprise of my own.

I wrote a small program to list all the filenames in my archives
including recursive searches into zip files.
52GB of archive netted me a 22MB text file of full path names.

I used grep to search the file.
I had over 18,000 files with "usb" in the pathname. Almost
none of them had usb in the filename.

I'm gonna have to rethink exactly what I want from a file name search.

Searches for less common terms did produce useful results tho.

From: Nam Quang Tran on 19 Jan 2010 11:17

@ mike:

> Scenario 2: Learn Python.....
> I ain't learnin' another language until the existing one
> can't be coerced into solving my problem.

This is not just about what you can do with a language, it's also
about how much time and effort it takes to solve a given problem in
that language.

Perhaps you misunderstood me, I didn't really mean "throw VB6 away",
but "add Python to your toolbox". Takes about 1-2 weeks to learn, and
looking back on the 5-6 programming languages I know, I'd say Python
had by far the best effort-to-payoff ratio. Anyway, do what you want,
I'm not your mom :p

> Already got that. Just leave it turned off most of the time to
> save wear and tear. Also insulates the data from external malware
> and internal stupidity...I seem to have an infinite supply of
> internal stupidity.

Hm... then how about using the external hard drive as the main
repository and the DVDs as mere backups in case of outstanding
stupidity? Again, I don't wanna act like your mom, I'm just asking out
of curiosity, cause I'd like to understand my users a little better.

> This isn't about how I manage arhives. It's about helping you
> make the BESTEST indexing program in the world.

Thank you, really appreciate that. Working with you as the "beta
tester from hell" (hurr hurr) was really fun. You know, without the
feedback we developers often can only guess what our users want or
don't want.

> OOPS, discovered another issue.
> When you abort an indexing operation, it claims it will delete the
> index. And it empties the search scope box...but the indexes directory
> is still 1.1GB.

Works for me. Deleting 1.1 GB takes time, so you might have to wait a
little until the index disappears.

> Adding a single checkbox, "include all filenames in index"
> would go a long way toward meeting my needs.
> If I could check that box and uncheck all the others, I'd get a small
> searchable filename index. Lots of flexibility.

If you just want to index filenames, then DocFetcher isn't the right
tool. Try 'Everything' (http://www.voidtools.com/) or something like
that. These programs specialize on filename search, so both
performance and index size should be far better than those of full-
text search apps.

> I used grep to search the file.
> I had over 18,000 files with "usb" in the pathname. Almost
> none of them had usb in the filename.

Then I'd search only in the filenames, not in the filepaths.

From: Nam Quang Tran on 19 Jan 2010 11:25

On Jan 18, 10:35 pm, surprise <californiacarr...(a)gmail.com> wrote:
> 1.79 Gigabytes of documents mostly pdf and text searching for the
> term
> "food drive" with the quotation marks
>
> DOC FETCHER
>
> Found 1 document 1040.txt
> With in the document the results were 24 occurrences of which
> 4 were "food" with out quotes
> 12 were "drive" with out quotes
> 8 were "food drive" with out the quotes or 4 times "food drive" was
> found together.
>
> If "food drive" was entered with out the quotes 50 matching documents
> were found. None of the documents other than 1040.txt of this group
> seems to have an instance of food drive.
>
> 1) ARE THE INSTANCES OF "FOOD" AND "DRIVE" THAT ARE NOT TOGETHER SOME
> TYPE OF FUZZY SEARCH? IN SOME INSTANCES THIS FEATURE OF THE PROGRAM
> MAY BE BENEFICIAL. HOWEVER, WHEN SEARCHING THROUGH MANY DOCUMENTS IT
> WILL BE NICE TO BE ABLE TO TURN THIS FEATURE OFF.
>
> 2) THE PROGRAM SEEMS TO BE WEAK AT SEARCHING THROUGH PDF FILES. I AM
> LED TO BELIEVE THIS WILL BE CORRECTED SOON.

The search with quotation marks and the PDF indexing problems should
be gone in the new 1.0.2 release. See download page:
http://docfetcher.sourceforge.net/en/download.html

If DocFetcher brings up documents that seemingly don't contain the
words you searched for, then it could be that they have additional
text in the filename, title, or author field. Also, note that if HTML
pairing is on, then the stuff in the "XXXX_files" folders is
searchable, but doesn't show up in the preview.

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12
Prev: [webapp] Storytlr: a lifestreamer
Next: What Good is a Portable App that Won't Run?