From: mike on 18 Jan 2010 11:06 Nam Quang Tran wrote: > On Jan 18, 4:44 am, mike <spam...(a)go.com> wrote: >> I spent the afternoon figuring out how to recursively list the contents >> of zip files inside zip files with VB6. Was pretty easy, 'cause I just >> copied what someone else already figured out, but I've still got all manner >> of issues with file permissions and error recovery. > > Throw VB6 away and start learning Python. Just my 2 cents. Thats' just silly. Scenario 1: click the vb icon. Type in a dozen commands. Click compile. Scenario 2: Learn Python..... I ain't learnin' another language until the existing one can't be coerced into solving my problem. > > >> Well, I've already split it into 50 DVD's ;-) >> >> I got very annoyed during the testing because I had to navigate to the >> test directory every time. Maybe the create archive process could remember >> the last place you created...the .ini file is already there. >> I didn't try drag/drop. Maybe that solves the problem. > > Good idea. As I said, you're one hell of a beta-tester. > > Btw, have you ever thought about moving your DVD archive to an > external USB storage device? These days you can buy some real good > ones with 1 TB storage capacity or more. Since DocFetcher (amongst > others) is portable, you could put it on the USB drive as well, which > would solve the mounting/unmounting issue altogether. Already got that. Just leave it turned off most of the time to save wear and tear. Also insulates the data from external malware and internal stupidity...I seem to have an infinite supply of internal stupidity. This isn't about how I manage arhives. It's about helping you make the BESTEST indexing program in the world. Beta 3 Leading wildcard in search seems to work now. I clicked the search scope boxes to index .exe and .zip files. Big files give the error: Not enough memory left in the java virtual machine... Guess that's expected... I really don't want a text search inside exe files. Don't want the bloat. Serves no purpose. And indexing takes forever. I'm about 700 indexed files into it and the index directory is already over 1.2GB. That ain't gonna do. OOPS, discovered another issue. When you abort an indexing operation, it claims it will delete the index. And it empties the search scope box...but the indexes directory is still 1.1GB. Recursively continuing the index inside .zip and .exe(that are zip archives) would be interesting. At this point, I just want the filename added to the index so I can use one tool for all my file indexing needs. Adding a single checkbox, "include all filenames in index" would go a long way toward meeting my needs. If I could check that box and uncheck all the others, I'd get a small searchable filename index. Lots of flexibility.
From: surprise on 18 Jan 2010 16:35 1.79 Gigabytes of documents mostly pdf and text searching for the term "food drive" with the quotation marks DOC FETCHER Found 1 document 1040.txt With in the document the results were 24 occurrences of which 4 were "food" with out quotes 12 were "drive" with out quotes 8 were "food drive" with out the quotes or 4 times "food drive" was found together. If "food drive" was entered with out the quotes 50 matching documents were found. None of the documents other than 1040.txt of this group seems to have an instance of food drive. 1) ARE THE INSTANCES OF "FOOD" AND "DRIVE" THAT ARE NOT TOGETHER SOME TYPE OF FUZZY SEARCH? IN SOME INSTANCES THIS FEATURE OF THE PROGRAM MAY BE BENEFICIAL. HOWEVER, WHEN SEARCHING THROUGH MANY DOCUMENTS IT WILL BE NICE TO BE ABLE TO TURN THIS FEATURE OFF. 2) THE PROGRAM SEEMS TO BE WEAK AT SEARCHING THROUGH PDF FILES. I AM LED TO BELIEVE THIS WILL BE CORRECTED SOON. WILMA "food drive" with the quotation marks Found 5 documents 4 pdf 1 1040.txt 4 times "food drive" was found together in 1040.txt. WILMA WAS USING INTERNAL FILTERS. IF PDFTOTXT IS USED, AS EXTERNAL FILTER MISSING PDF FILE COULD POSSIBLY BE SEARCHED. WILMA ALSO MAY NO LONGER BE PORTABLE IF PDFTOTEXT IS USED.) LUCERNE DESKTOP SEARCH 6 documents listed 5 pdf 1 1040.txt Lucerne Desktop Search can't see inside documents INFO RAPID SEARCH AND REPLACE set to match words entered. does not need quotation marks to do this search. 6 files found 5 pdf 1 1040.txt 5 instances of food drive found in 1040.txt (One instance was actually "food drives" this was not found in the other searches. COPERNIC DESKTOP SEARCH 6 files found 5 are pdf files 1 1040.txt 4 instances of food drive found in 1040.txt Search inform Free looking for "food drive" again this program does not use quotes. 6 files found with "food drive" 5 files are pdf 1 1040.txt 4 "food drives" found in 1040.txt
From: mike on 18 Jan 2010 18:23 I had a little surprise of my own. I wrote a small program to list all the filenames in my archives including recursive searches into zip files. 52GB of archive netted me a 22MB text file of full path names. I used grep to search the file. I had over 18,000 files with "usb" in the pathname. Almost none of them had usb in the filename. I'm gonna have to rethink exactly what I want from a file name search. Searches for less common terms did produce useful results tho.
From: Nam Quang Tran on 19 Jan 2010 11:17 @ mike: > Scenario 2: Learn Python..... > I ain't learnin' another language until the existing one > can't be coerced into solving my problem. This is not just about what you can do with a language, it's also about how much time and effort it takes to solve a given problem in that language. Perhaps you misunderstood me, I didn't really mean "throw VB6 away", but "add Python to your toolbox". Takes about 1-2 weeks to learn, and looking back on the 5-6 programming languages I know, I'd say Python had by far the best effort-to-payoff ratio. Anyway, do what you want, I'm not your mom :p > Already got that. Just leave it turned off most of the time to > save wear and tear. Also insulates the data from external malware > and internal stupidity...I seem to have an infinite supply of > internal stupidity. Hm... then how about using the external hard drive as the main repository and the DVDs as mere backups in case of outstanding stupidity? Again, I don't wanna act like your mom, I'm just asking out of curiosity, cause I'd like to understand my users a little better. > This isn't about how I manage arhives. It's about helping you > make the BESTEST indexing program in the world. Thank you, really appreciate that. Working with you as the "beta tester from hell" (hurr hurr) was really fun. You know, without the feedback we developers often can only guess what our users want or don't want. > OOPS, discovered another issue. > When you abort an indexing operation, it claims it will delete the > index. And it empties the search scope box...but the indexes directory > is still 1.1GB. Works for me. Deleting 1.1 GB takes time, so you might have to wait a little until the index disappears. > Adding a single checkbox, "include all filenames in index" > would go a long way toward meeting my needs. > If I could check that box and uncheck all the others, I'd get a small > searchable filename index. Lots of flexibility. If you just want to index filenames, then DocFetcher isn't the right tool. Try 'Everything' (http://www.voidtools.com/) or something like that. These programs specialize on filename search, so both performance and index size should be far better than those of full- text search apps. > I used grep to search the file. > I had over 18,000 files with "usb" in the pathname. Almost > none of them had usb in the filename. Then I'd search only in the filenames, not in the filepaths.
From: Nam Quang Tran on 19 Jan 2010 11:25
On Jan 18, 10:35 pm, surprise <californiacarr...(a)gmail.com> wrote: > 1.79 Gigabytes of documents mostly pdf and text searching for the > term > "food drive" with the quotation marks > > DOC FETCHER > > Found 1 document 1040.txt > With in the document the results were 24 occurrences of which > 4 were "food" with out quotes > 12 were "drive" with out quotes > 8 were "food drive" with out the quotes or 4 times "food drive" was > found together. > > If "food drive" was entered with out the quotes 50 matching documents > were found. None of the documents other than 1040.txt of this group > seems to have an instance of food drive. > > 1) ARE THE INSTANCES OF "FOOD" AND "DRIVE" THAT ARE NOT TOGETHER SOME > TYPE OF FUZZY SEARCH? IN SOME INSTANCES THIS FEATURE OF THE PROGRAM > MAY BE BENEFICIAL. HOWEVER, WHEN SEARCHING THROUGH MANY DOCUMENTS IT > WILL BE NICE TO BE ABLE TO TURN THIS FEATURE OFF. > > 2) THE PROGRAM SEEMS TO BE WEAK AT SEARCHING THROUGH PDF FILES. I AM > LED TO BELIEVE THIS WILL BE CORRECTED SOON. The search with quotation marks and the PDF indexing problems should be gone in the new 1.0.2 release. See download page: http://docfetcher.sourceforge.net/en/download.html If DocFetcher brings up documents that seemingly don't contain the words you searched for, then it could be that they have additional text in the filename, title, or author field. Also, note that if HTML pairing is on, then the stuff in the "XXXX_files" folders is searchable, but doesn't show up in the preview. |