From: Nam Quang Tran on 17 Jan 2010 05:04 On Jan 17, 6:11 am, surprise <californiacarr...(a)gmail.com> wrote: > When I enter the search term food drive I get, food items, food > drive, food from, drive. Have I entered the search improperly? I would > like to only get (food drive). If you enter just "food drive" (without the quotes), it is interpreted as "food OR drive". If you enter "food drive" WITH the quotes, then it's a so-called phrase search, giving you only (food drive). For more information, have a look at the 'Query syntax' section in the DocFetcher manual. Please note that the search term highlighting in the preview panel isn't fully implemented yet. So the search results may be correct, but the highlighting won't be for most of the more complicated search strings. This is also listed in the 'bug reports' section of our wiki: http://sourceforge.net/apps/mediawiki/docfetcher/index.php?title=Bug_Reports > However, If I enter food AND drive, I get many more documents in the > search than when I enter food drive. The additional documents in the > search dont seem to have the term (food drive) in them just the two > words. > > What is happening? Common sense tells me that entering "food drive" (w/o quotes) should yield more results than "food AND drive" (w/o quotes). I tested it and it works for me. Here's a list of how many results I get for a particular search string: test: 48 data: 64 test data: 83 test AND data: 29 This makes sense, because 48 + 64 - 29 = 83 I'm not sure why you get different results. Could you post the actual number of results for each of your search strings, please?
From: Nam Quang Tran on 17 Jan 2010 05:13 @ mike: You said you wanted to index drives that are mounted and unmounted regularly. I foresee a complication here: DocFetcher currently doesn't allow having multiple indexes for the same folder. So if you mount all your drives on the same location, this could be a problem. Could you check this, please? If this is indeed a problem, I could change DocFetcher's behavior so that the user only gets a warning message, but is allowed to proceed.
From: Mike Mills on 17 Jan 2010 10:28 mike <spamme0(a)go.com> wrote in news:hiprgg$qbh$1(a)news.eternal-september.org: > Why Tea wrote: >> On Jan 15, 4:49 pm, Mike Mills <effici...(a)canada.com> wrote: >>> Why Tea <ytl...(a)gmail.com> wrote in >>> news:61cb8789-7285-4c7a-826d- >>> e41801d6a...(a)21g2000yqj.googlegroups.com: >>> >>>> http://docfetcher.sourceforge.net/en/index.html >>> I tested DocFetcher briefly using a part of a bird project I am >>> working on. >>> >>> <!--runs on java , starts slowly but works quickly.--> >>> This installation chosen is "portable" >>> The indexes can be created and then later burned to cd. >>> >>> <!--free--> >>> >>> <!--highlighted found terms--> >>> >>> Only shows highlighted found terms when using text type html >>> display without pictures, this is the quick display, the other >>> display type shows pictures, but does not highlight found terms. >>> [click on button top of bottom window to toggle]. >>> >>> Indexes ~222 species [>1200 files] & creates ~1Mb index fairly >>> quickly, slows down a little while indexing .pdf files. >>> >>> Searches in .doc .docx .htm .html .xls .txt .info etc. files as >>> selected. >>> directories to search and index are individually selected. >>> Indexing a *whole* drive is not encouraged. This is the opposite >>> of the Google Desktop Search approach. >>> >>> I intend to experiment further with this application, it seems >>> worthwhile. I usually use indexyourfiles to create an index and >>> to later burn the index to a dvd where it will run. This is >>> nearly always adequate for my needs. This docFetcher will >>> apparently do more which I may find useful. It is slower than >>> IndexYourFiles because it indexes every word rather than just >>> the titles, of course the indexes are *much* larger than the >>> filename indexes in IndexYourFiles. > > The plot thickens. > I did a test index on about a gigabyte of files. > Got about a hundred pdf files and one MSExcel file > that failed to index. > In my case I have very few .pdf files to worry about, but there was an indicated error in the database corrections because it turns out the file was password protected. I didn't mention it in my review because it seemed insignificant to me. Often if possible I convert pdf to text anyway . My xls files were quite large, containing indexes of bird names as reference, and I chose not to index that file type at this time because the index size would get crazy. > FWIW, I typically rename downloaded files using keywords > I recognize. Since docfetcher does not appear to index > the filenames, if it fails on the pdf, there's > no indication in the database that the file exists. > > I like docfetcher a lot. It lets me index directories > individually without having to re-index everything else. > I just can't get it to index everything in the directory. Thanks for pointing this [feature] out. I didn't pick it up in my test. As I mentioned before, often just the filename search will be adequate for my purposes. The size consideration is important to me. If I added the DocFetcher index for 222 species I would add 1 Mb to my dvd size, If I added ~10000 species to the index I could increase the database size to ~ 45 Mb. This is more than 10 times larger than my current index of filenames made with IndexYourFiles. http://www.indexyourfiles.com
From: Nam Quang Tran on 17 Jan 2010 10:55 On Jan 17, 4:28 pm, Mike Mills <effici...(a)canada.com> wrote: > Thanks for pointing this [feature] out. I didn't pick it up in my > test. As I mentioned before, often just the filename search will be > adequate for my purposes. The size consideration is important to me. > If I added theDocFetcherindex for 222 species I would add 1 Mb to > my dvd size, If I added ~10000 species to the index I could increase > the database size to ~ 45 Mb. This is more than 10 times larger than > my current index of filenames made with IndexYourFiles. > > http://www.indexyourfiles.com 45 MB is big?? o.0 On a 4.7 GB DVD, that would be about 1% of total capacity. Are your system constraints really that tight??
From: mike on 17 Jan 2010 12:28
Mike Mills wrote: > mike <spamme0(a)go.com> wrote in > news:hiprgg$qbh$1(a)news.eternal-september.org: > >> Why Tea wrote: >>> On Jan 15, 4:49 pm, Mike Mills <effici...(a)canada.com> wrote: >>>> Why Tea <ytl...(a)gmail.com> wrote in >>>> news:61cb8789-7285-4c7a-826d- >>>> e41801d6a...(a)21g2000yqj.googlegroups.com: >>>> >>>>> http://docfetcher.sourceforge.net/en/index.html >>>> I tested DocFetcher briefly using a part of a bird project I am >>>> working on. >>>> >>>> <!--runs on java , starts slowly but works quickly.--> >>>> This installation chosen is "portable" >>>> The indexes can be created and then later burned to cd. >>>> >>>> <!--free--> >>>> >>>> <!--highlighted found terms--> >>>> >>>> Only shows highlighted found terms when using text type html >>>> display without pictures, this is the quick display, the other >>>> display type shows pictures, but does not highlight found terms. >>>> [click on button top of bottom window to toggle]. >>>> >>>> Indexes ~222 species [>1200 files] & creates ~1Mb index fairly >>>> quickly, slows down a little while indexing .pdf files. >>>> >>>> Searches in .doc .docx .htm .html .xls .txt .info etc. files as >>>> selected. >>>> directories to search and index are individually selected. >>>> Indexing a *whole* drive is not encouraged. This is the opposite >>>> of the Google Desktop Search approach. >>>> >>>> I intend to experiment further with this application, it seems >>>> worthwhile. I usually use indexyourfiles to create an index and >>>> to later burn the index to a dvd where it will run. This is >>>> nearly always adequate for my needs. This docFetcher will >>>> apparently do more which I may find useful. It is slower than >>>> IndexYourFiles because it indexes every word rather than just >>>> the titles, of course the indexes are *much* larger than the >>>> filename indexes in IndexYourFiles. > >> The plot thickens. >> I did a test index on about a gigabyte of files. >> Got about a hundred pdf files and one MSExcel file >> that failed to index. >> > In my case I have very few .pdf files to worry about, but there was > an indicated error in the database corrections because it turns out > the file was password protected. I didn't mention it in my review > because it seemed insignificant to me. Often if possible I convert > pdf to text anyway . My xls files were quite large, containing > indexes of bird names as reference, and I chose not to index that > file type at this time because the index size would get crazy. > > >> FWIW, I typically rename downloaded files using keywords >> I recognize. Since docfetcher does not appear to index >> the filenames, if it fails on the pdf, there's >> no indication in the database that the file exists. >> >> I like docfetcher a lot. It lets me index directories >> individually without having to re-index everything else. >> I just can't get it to index everything in the directory. > > Thanks for pointing this [feature] out. I didn't pick it up in my > test. As I mentioned before, often just the filename search will be > adequate for my purposes. The size consideration is important to me. > If I added the DocFetcher index for 222 species I would add 1 Mb to > my dvd size, If I added ~10000 species to the index I could increase > the database size to ~ 45 Mb. This is more than 10 times larger than > my current index of filenames made with IndexYourFiles. > > http://www.indexyourfiles.com I attempted three times to index a whole dvd worth of data. Once, it locked up completely, "program not responding". The other two got a repeatable non-fatal error with some kind of error dump. Out of 12000 files, I only had 600 indexable files. I also found that I had a lot more stuff inside zip archives than I thought. Not having the filenames in the index killed the project. Finding the other 95% of the files is way more important to me than indexing 5%. Now, I'm back to looking for a way to index file names inside zip archives. Not found anything yet that will let me search filenames from an index when the actual files are not mounted. For now, looks like I'm stuck with mounting the archive media and using totalcommander to search it interactively. It's a better fit for my current needs. |