DocFetcher - java-based desktop searcher [Freeware]

Prev: [webapp] Storytlr: a lifestreamer
Next: What Good is a Portable App that Won't Run?

From: Nam Quang Tran on 16 Jan 2010 12:56

On Jan 16, 6:52 pm, Nam Quang Tran <qforce....(a)googlemail.com> wrote:
> Yes, the WatchFS=true is relevant, but you can just click on the
> "Watch indexed folders" checkbox on the preferences dialog. They're
> one and the same :-) In fact, all of what you see on the preferences
> dialog is stored somewhere in the user.properties file.

For folder watching, there's also a daemon, as described on the main
page of the DocFetcher manual:

"Index updates: If documents in the document repository are added,
removed or modified, the corresponding indexes have to be updated.
This is done automatically in two ways: (1) If DocFetcher is running,
it detects the changes and updates the indexes immediately. (2) If
DocFetcher isn't running, the changes are detected by a small daemon
program that runs in the background; then the affected indexes are
updated the next time DocFetcher starts.
Note: If you are using the portable version of DocFetcher and want the
daemon to be run automatically on startup, you have to manually add
the daemon executable(s) to your operating system's list of startup
programs."

From: mike on 16 Jan 2010 13:12

Nam Quang Tran wrote:
> On Jan 16, 6:21 pm, mike <spam...(a)go.com> wrote:
>
>> Is there a manpage that describes manual configurations?
>
> Yup. We have a wiki over here: http://sourceforge.net/apps/mediawiki/docfetcher/index.php?title=Main_Page
> In the 'advanced usage' section there's a list of the most useful keys
> in the user.properties file. I can add some more if requested.
>
>
>>> However, I can't give you a hit for
>>> "CFG280" if you're searching for "CFG". This has something to do with
>>> the way indexing works: After text extraction, the text is split into
>>> separate words using a so-called "tokenizer". If I make the tokenizer
>>> split "CFG280" into two halves, then you'll be able to search for
>>> "CFG" and "280", but not for "CFG280". If the tokenizer doesn't split
>>> it, then you can search for "CFG280", but not for "CFG" or "280". I
>>> had to pick one tokenizer, so I chose the second one.
>> I don't quite understand what you're telling me, but if I get a hit
>> on any word that matches *cf* or *280 or cfg* , I'm a happy camper.
>
> I was trying to explain why you can't find "CFG280" if you search for
> "CFG". It's a technical limitation of index-based search, which is
> quite different from the Ctrl+F search in your average text editor.
> (If DocFetcher's search worked like that of a text-editor, searches
> would be 1000x slower!!)

As long as the wildcards work, I'm happy.
>
>
>> I've discovered another issue that's a problem for me.
>>
>> I index the directory,
>> Search for a term, get a hit.
>> Delete the target file.
>> Search for the term, no hit.
>> Pull the file out of the recycle bin, get a hit.
>> So the index is still there when the file isn't.
>> It just won't let me search it.
>>
>> I have offline repositories of file archives. I leave the external
>> drives turned off most of the time, 'cause I'm clumsy and have a propensity
>> to delete stuff accidentally ;-) And most are on DVD's that are
>> not currently mounted at the time of the search.
>> I need to be able to search the index when the actual target files are not
>> online. Obviously, I can't expect the preview to work, but that's ok.
>> I just want to know if and where the file exists so I can mount the media.
>>
>> One of the BEST features of DocFetcher is the ability to manually
>> and quickly
>> index any part of the file system without reindexing the whole
>> thing again. It doesn't try to automatically track changes.
>> Or, I didn't think it did.
>>
>> There's a WatchFS=true in the properties file. That relevant?
>>
>> Anyway, is there any way to set it up so I can search indexes for
>> currently unmounted files?
>
> Yes, the WatchFS=true is relevant, but you can just click on the
> "Watch indexed folders" checkbox on the preferences dialog. They're
> one and the same :-) In fact, all of what you see on the preferences
> dialog is stored somewhere in the user.properties file.

That fixed it.
>
>
>> I did a bunch more indexing. Doesn't index files inside archives.
>> I use .zip files to organize related stuff. That's more my problem
>> than yours, but would be nice to have in a future release.
>
> On the 'feature request' section of the aforementioned wiki, archive
> indexing is listed as one of the planned features. The actual
> implementation however isn't as easy as you might think. I'll have to
> rewrite large portions of the program for this to work. :/
>
> q:-) <= Quang

Nice job.
I'll shut up now.
mike

From: mike on 16 Jan 2010 13:17

Nam Quang Tran wrote:
> On Jan 16, 6:52 pm, Nam Quang Tran <qforce....(a)googlemail.com> wrote:
>> Yes, the WatchFS=true is relevant, but you can just click on the
>> "Watch indexed folders" checkbox on the preferences dialog. They're
>> one and the same :-) In fact, all of what you see on the preferences
>> dialog is stored somewhere in the user.properties file.
>
> For folder watching, there's also a daemon, as described on the main
> page of the DocFetcher manual:
>
> "Index updates: If documents in the document repository are added,
> removed or modified, the corresponding indexes have to be updated.

That's the point. I don't want anything modified without my express
command. That's the problem I had copernic. It what I wanted for
indexing/searching
but would automatically screw everthing up if I unmounted an indexed
directory. Couldn't figure out how to turn it off.

> This is done automatically in two ways: (1) If DocFetcher is running,
> it detects the changes and updates the indexes immediately. (2) If
> DocFetcher isn't running, the changes are detected by a small daemon
> program that runs in the background; then the affected indexes are
> updated the next time DocFetcher starts.

Yep, I want that off.
> Note: If you are using the portable version of DocFetcher and want the
> daemon to be run automatically on startup, you have to manually add
> the daemon executable(s) to your operating system's list of startup
> programs."

I think I want to put the portable version on a flash drive along with
the indexes. I'll add to the indexes, but leave the old stuff intact.

Looks like this does what I want.
Thanks,mike

From: Nam Quang Tran on 16 Jan 2010 13:26

On Jan 16, 7:12 pm, mike <spam...(a)go.com> wrote:
> As long as the wildcards work, I'm happy.

So I better not remove the wildcards ;-)

> Nice job.
> I'll shut up now.
> mike

Thanks. I'll release DocFetcher 1.0.2 on Tuesday if I don't get any
more suggestions.

q:-) <= qforce

From: surprise on 17 Jan 2010 00:11

When I enter the search term food drive I get, food items, food
drive, food from, drive. Have I entered the search improperly? I would
like to only get (food drive).

However, If I enter food AND drive, I get many more documents in the
search than when I enter food drive. The additional documents in the
search dont seem to have the term (food drive) in them just the two
words.

What is happening?

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12
Prev: [webapp] Storytlr: a lifestreamer
Next: What Good is a Portable App that Won't Run?