From: Randal L. Schwartz on 8 Feb 2010 14:27 >>>>> "laredotornado" == laredotornado <laredotornado(a)zipmail.com> writes: laredotornado> How would I search for files that have the same name, but potentially laredotornado> different case, living in the same directory? For example, I would laredotornado> want to find files like laredotornado> /dir1/image1.gif laredotornado> /dir1/IMAGE1.gif Untested, but I think this'll do: #!/usr/bin/perl use File::Find; my %names; find sub { push @{$names{lc $File::Find::name}}, $File::Find::name; }, "/"; for (values %names) { next unless @$_ > 1; print "@$_\n"; } This'll list all the files with identical name mappings on the same line, separated by space. print "Just another Perl hacker,"; # the original -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 <merlyn(a)stonehenge.com> <URL:http://www.stonehenge.com/merlyn/> Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc. See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion
From: Maxwell Lol on 12 Feb 2010 06:57 laredotornado <laredotornado(a)zipmail.com> writes: > Hi, > > How would I search for files that have the same name, but potentially > different case, living in the same directory? For example, I would > want to find files like > > /dir1/image1.gif > /dir1/IMAGE1.gif you could also use the less elegant approach find . | tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr and look for files that have 2 or more entries
From: David Combs on 21 Feb 2010 20:59 In article <slrnhmuiu8.2v8.usenet-nospam(a)guild.seebs.net>, Seebs <usenet-nospam(a)seebs.net> wrote: >On 2010-02-07, laredotornado <laredotornado(a)zipmail.com> wrote: >> How would I search for files that have the same name, but potentially >> different case, living in the same directory? For example, I would >> want to find files like > >Within a directory: > > for file in * > do > if ls | grep -v "^$file\$" | grep -qi "^$file\$" > then echo "found similar matches for '$file'." > fi > done > Good Lord, that's O(n^2), right off the bat, an ls inside an "effective ls", assuming those greps somehow do what you want. Why not simply sort them all, using a sort that allows you to pass an expression that decides on whether a or b is bigger, in which you do a tr or toLower or whatever, but it's the UNmodified values that end up coming out the sort-order that didn't consider case, then one final run-through of those "sorted" results, and within every "run" of again tr'd or toLowered, if they differ in their original form, you've found at least one of what you're looking for. What's that so far, n log n + n? Or, if you want more info, then for each of those toLower'd runs, you sort THAT, "straight", then basically run a "uniq" on that. (If you're running a million strings through it, I suppose you'd code the uniq by hand, "in line", because of all the start-up time for the process start-ups. Or maybe depending on the length of the run you choose which to do. Maybe ditto for that inner sort. Question: does ANY of that make sense? As soon as I post this, I'll probably realize it's all wet! David
From: David Combs on 21 Feb 2010 21:15 In article <hkp3cb$3bh$1(a)news.eternal-september.org>, Ed Morton <mortonspam(a)gmail.com> wrote: >On 2/7/2010 7:14 PM, Janis Papanagnou wrote: >> Janis Papanagnou wrote: >>> laredotornado wrote: >>>> Hi, >>>> >>>> How would I search for files that have the same name, but potentially >>>> different case, living in the same directory? For example, I would >>>> want to find files like >>>> >>>> /dir1/image1.gif >>>> /dir1/IMAGE1.gif >>>> >>>> but I don't care about >>>> >>>> /dir1/image1.gif >>>> /dir1/dir2/image1.gif > >How do you feel about: > > /dir1/image1.gif > /DIR1/image1.gif > >or: > > /dir1/dir2 > /dir1/DIR2 > >where dir2 and DIR2 are directories? > >>>> ? Hope this question makes sense, - Dave >>> >>> This awk program stores the converted filenames that it reads from stdin >>> in lowercase and prints any new filename that matches case-insensitive... >>> >>> awk 'tolower($0) in f ; { f[tolower($0)] }' >>> >>> You can feed files from a current directory >>> >>> awk 'tolower($0) in f ; { f[tolower($0)] }' * >> >> Didn't know what I was thinking with the previous line; should have been >> >> ls | awk '...' or ls */* | awk '...' or somesuch. >> >>> >>> or (if case insensitive directories are not a problem) from a directory >>> tree >>> >>> find . | awk 'tolower($0) in f ; { f[tolower($0)] }' >>> >>> Just one way to approach the task. >> >> And since I am posting anyway I can point to the terse "golf version" as >> well... >> >> find . | awk 'f[tolower($1)]++' > >ITYM: > > find . | awk 'f[tolower($0)]++' > >or if the OP really only cares about files with matching names but does care >about differentiating directories with different case: > > find . -type f | > awk -F'/' '{file=tolower($NF); sub(/[/][^/]+$/,"",$0)} f[$0 "/" file]++' > GOLF is nice, but I'm no Tiger Woods, so maybe you could explain that a bit. (To understand how little I understand awk, I'm searching for the curly brackets I (mistakenly, obviously) thought were required!) >Usual caveat about file names that contain newlines. That is a joke, yes? Although I wouldn't put it past Windows to allow it! Thanks, David
From: Janis Papanagnou on 21 Feb 2010 22:02 David Combs wrote: > In article <hkp3cb$3bh$1(a)news.eternal-september.org>, > Ed Morton <mortonspam(a)gmail.com> wrote: >> On 2/7/2010 7:14 PM, Janis Papanagnou wrote: >>> Janis Papanagnou wrote: >>>> laredotornado wrote: >>>>> Hi, >>>>> >>>>> How would I search for files that have the same name, but potentially >>>>> different case, living in the same directory? For example, I would >>>>> want to find files like >>>>> >>>>> /dir1/image1.gif >>>>> /dir1/IMAGE1.gif >>>>> >>>>> but I don't care about >>>>> >>>>> /dir1/image1.gif >>>>> /dir1/dir2/image1.gif >> How do you feel about: >> >> /dir1/image1.gif >> /DIR1/image1.gif >> >> or: >> >> /dir1/dir2 >> /dir1/DIR2 >> >> where dir2 and DIR2 are directories? >> >>>>> ? Hope this question makes sense, - Dave >>>> This awk program stores the converted filenames that it reads from stdin >>>> in lowercase and prints any new filename that matches case-insensitive... >>>> >>>> awk 'tolower($0) in f ; { f[tolower($0)] }' >>>> >>>> You can feed files from a current directory >>>> >>>> awk 'tolower($0) in f ; { f[tolower($0)] }' * >>> Didn't know what I was thinking with the previous line; should have been >>> >>> ls | awk '...' or ls */* | awk '...' or somesuch. >>> >>>> or (if case insensitive directories are not a problem) from a directory >>>> tree >>>> >>>> find . | awk 'tolower($0) in f ; { f[tolower($0)] }' >>>> >>>> Just one way to approach the task. >>> And since I am posting anyway I can point to the terse "golf version" as >>> well... >>> >>> find . | awk 'f[tolower($1)]++' >> ITYM: >> >> find . | awk 'f[tolower($0)]++' >> >> or if the OP really only cares about files with matching names but does care >> about differentiating directories with different case: >> >> find . -type f | >> awk -F'/' '{file=tolower($NF); sub(/[/][^/]+$/,"",$0)} f[$0 "/" file]++' >> > GOLF is nice, but I'm no Tiger Woods, so maybe you could explain > that a bit. I'll take the minimalist version... find . -type f | awk 'f[tolower($0)]++' find(1) lists the ordinary (regular) files and passes them to awk. Awk takes the filename fed through stdin, each line from find is stored in $0. Then make the filename in $0 lowercase: tolower($0). f[abc] is an associative array access with key abc; if the array element is non-existing it will be created. Now if the lowercased filename isn't yet in the array, the element will be created and incremented; for a file, say xyz.txt, the array element f["xyz.txt"] (which is initially 0) will be set to 1 (by the ++ operator). The increment will actually happen after the whole expression is evaluated, so on the first insert it evaluates to 0, and the array is incremented to 1, and each subsequent array access to the same array element (i.e. with an equivalent filename) will evaluate to a value greater 0, and incremented further. Now considering that awk programs are generally of the form condition { action } which means that if the condition is true (non-zero) the respective action will executed, and that the above awk program is a short form for f[tolower($0)]++ { print $0 } you see that the filename in $0 will be printed for every occurrence greater to 1. And if we recall that the filename that is used as key is always used in a normalized lowercase form then a input sequence of ABC.xyz abc.xyz abc.XYZ ABC.XYZ will increment the array element f["abc.xyz"] four times and the elements that come as #2, #3, #4 will be printed as duplicate. Janis > > (To understand how little I understand awk, I'm searching for the > curly brackets I (mistakenly, obviously) thought were required!) > > > >> Usual caveat about file names that contain newlines. > That is a joke, yes? Although I wouldn't > put it past Windows to allow it! > > > Thanks, > > David > >
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 Prev: is there a bash equivalent of "this" ... Next: Unix Script to process records in group |