Prev: Did Debian's text console font change recently? How to get theold one back? -- It was Nouveau in Kernel 2.6.32-5!
Next: alternate POP/SMTP ports with Evolution?
From: Rahul on 6 Jun 2010 17:49 I'm trying to join two unsorted files and print lines common to both based on a "key" field. The first field is the key. cat file1: A;Ablah B;Bblah cat file2: A;Ablahblah B;Bblahblah A join seems to work: join -t ';' -j 1 file1 file2 A;Ablah;Ablahblah B;Bblah;Bblahblah But the moment there is a non matching line the join fails: e.g. cat file1: C:Cblah A;Ablah B;Bblah Is there any way around this? To still output the lines where field1 matches? If not, then can awk etc. handle this situation? I've only used awk on single files before so not sure.... -- Rahul
From: The Natural Philosopher on 6 Jun 2010 18:57 Rahul wrote: > I'm trying to join two unsorted files and print lines common to both based > on a "key" field. The first field is the key. > > cat file1: > A;Ablah > B;Bblah > > cat file2: > A;Ablahblah > B;Bblahblah > > A join seems to work: > join -t ';' -j 1 file1 file2 > > A;Ablah;Ablahblah > B;Bblah;Bblahblah > > But the moment there is a non matching line the join fails: > > e.g. > > cat file1: > C:Cblah > A;Ablah > B;Bblah > > Is there any way around this? To still output the lines where field1 > matches? If not, then can awk etc. handle this situation? I've only used > awk on single files before so not sure.... > > > vaguely remember doing this with awk years ago..
From: Anton Treuenfels on 7 Jun 2010 01:25 "Rahul" <nospam(a)nospam.invalid> wrote in message news:Xns9D8FAB1CD93556650A1FC0D7811DDBC81(a)188.40.43.230... > I'm trying to join two unsorted files and print lines common to both based > on a "key" field. The first field is the key. > > cat file1: > A;Ablah > B;Bblah > > cat file2: > A;Ablahblah > B;Bblahblah > > Is there any way around this? To still output the lines where field1 > matches? If not, then can awk etc. handle this situation? I've only used > awk on single files before so not sure.... I'm going to assume any particular key field can appear any number of times in either file in any line and that the rest of each line can vary and that you only want one copy of each line from either file. One way is to read the first file twice and the second file once: BEGIN { ARGV[ARGC++] = ARGV[1] } FILENAME == "file1" { file1keys[ $1 ] = ".T." if ($1 in file2keys) } FILENAME == "file2" { file2keys[ $1 ] = ".T." if ( $1 in file1keys ) } Of course this will print out all the matching lines in file2 before any in file1. You can also of course make the order of filenames on the command line anything you want. - Anton Treuenfels
From: pk on 7 Jun 2010 04:12 Rahul wrote: > I'm trying to join two unsorted files and print lines common to both based > on a "key" field. The first field is the key. > > cat file1: > A;Ablah > B;Bblah > > cat file2: > A;Ablahblah > B;Bblahblah > > A join seems to work: > join -t ';' -j 1 file1 file2 > > A;Ablah;Ablahblah > B;Bblah;Bblahblah > > But the moment there is a non matching line the join fails: > > e.g. > > cat file1: > C:Cblah > A;Ablah > B;Bblah > > Is there any way around this? To still output the lines where field1 > matches? If not, then can awk etc. handle this situation? I've only used > awk on single files before so not sure.... Assuming no repeated keys, try awk -F \; -v OFS=\; 'NR==FNR{a[$1]=$2;next} $1 in a{print $1, a[$1], $2}' file1 file2
From: Rahul on 7 Jun 2010 19:12
"Anton Treuenfels" <teamtempest(a)yahoo.com> wrote in news:buqdnWDccvgqH5HRnZ2dnUVZ_i2dnZ2d(a)earthlink.com: > I'm going to assume any particular key field can appear any number of > times in either file in any line and that the rest of each line can > vary and that you only want one copy of each line from either file. > Thanks Anton for this general solution. My problem is simpler since keys are non-repeated. My bad, I should have mentioned. -- Rahul |