Prev: Useful use of cat? (was Re: This Week's Useless Use of Cat Award goes to...)
Next: Retain quotes when passing arguments to another script
From: ezhil on 24 Feb 2010 07:47 Hi, My program creates 100 output files each having 9 columns and 150 rows. The first 5 (out of 9) columns is the same in all 100 files. I am trying to parse all the files so that my final file will have the first 5 columns and 2 columns (7,9) from each file. Is there an elegant way of doing this? I have tried simple thing like redirecting all 100 files into a single file (using >>) and using NR counter (when it reaches 150) to select 2 columns in next 150 rows. It works fine now but what will happen if each file has different rows? I am also trying to parse each file when it created (on the fly) and then delete the file (instead of creating 100 output files). I am writing a shell script and try to use awk inside the shell script. But it is not working. Thanks in advance. Kind regards, Ezhil
From: pk on 24 Feb 2010 07:57 ezhil wrote: > My program creates 100 output files each having 9 columns and 150 > rows. The first 5 (out of 9) columns is the same in all 100 files. I > am trying to parse all the files so that my final file will have the > first 5 columns and 2 columns (7,9) from each file. Is there an > elegant way of doing this? So your output should be c1 c2 c3 c4 c5 c7f1 c9f1 c7f2 c9f2 ... c7f100 c9f100 ^^^^^^^^^^^^^^ ^^^^^^^^^ ^^^^^^^^^ ^^^^^^^^^^^^^ common file1 file2 ... file100 If that is what you want, try this: awk 'NR==FNR{out[FNR]=$1 OFS $2 OFS $3 OFS $4 OFS $5} {out[FNR]=out[FNR] OFS $7 OFS $9} END {for(i=1;i<=FNR;i++) print out[i]}' file1 file2 file3 ... file100 set OFS to a different value (eg, awk -v OFS=',' etc. for a comma) if you need a different output separator.
From: Ed Morton on 24 Feb 2010 08:19 On 2/24/2010 6:57 AM, pk wrote: > ezhil wrote: > >> My program creates 100 output files each having 9 columns and 150 >> rows. The first 5 (out of 9) columns is the same in all 100 files. I >> am trying to parse all the files so that my final file will have the >> first 5 columns and 2 columns (7,9) from each file. Is there an >> elegant way of doing this? > > So your output should be > > c1 c2 c3 c4 c5 c7f1 c9f1 c7f2 c9f2 ... c7f100 c9f100 > ^^^^^^^^^^^^^^ ^^^^^^^^^ ^^^^^^^^^ ^^^^^^^^^^^^^ > common file1 file2 ... file100 > > If that is what you want, try this: > > awk 'NR==FNR{out[FNR]=$1 OFS $2 OFS $3 OFS $4 OFS $5} > {out[FNR]=out[FNR] OFS $7 OFS $9} > END {for(i=1;i<=FNR;i++) print out[i]}' file1 file2 file3 ... file100 > > set OFS to a different value (eg, awk -v OFS=',' etc. for a comma) if you > need a different output separator. > The OP had this concern: > It works fine now but what will happen if > each file has different rows? so apparently you can't assume he'll have the same number of lines in every file and particularly you can't assume the number of lines in the last file read will be the max number of lines. He doesn't say what to do in that case, but we could do this: awk 'NR==FNR{out[FNR]=$1 OFS $2 OFS $3 OFS $4 OFS $5} {out[FNR]=out[FNR] OFS $7 OFS $9; maxFnr=(FNR > maxFnr ? FNR : maxFnr)} END {for(i=1;i<=maxFnr;i++) print out[i]}' file1 file2 file3 ... file100 Note the change from FNR to maxFnr in the END loop. There's probably more needs to be done so the "columns" don't get left-shifted if there's fewer lines in some files but until the OP tells us what he wants (e.g. populating some "NULL" value in columns if missing lines) there's not much point guessing any further.... Ed.
From: ezhil on 24 Feb 2010 08:28 On Feb 24, 12:57 pm, pk <p...(a)pk.invalid> wrote: > ezhil wrote: > > My program creates 100 output files each having 9 columns and 150 > > rows. The first 5 (out of 9) columns is the same in all 100 files. I > > am trying to parse all the files so that my final file will have the > > first 5 columns and 2 columns (7,9) from each file. Is there an > > elegant way of doing this? > > So your output should be > > c1 c2 c3 c4 c5 c7f1 c9f1 c7f2 c9f2 ... c7f100 c9f100 > ^^^^^^^^^^^^^^ ^^^^^^^^^ ^^^^^^^^^ ^^^^^^^^^^^^^ > common file1 file2 ... file100 > > If that is what you want, try this: > > awk 'NR==FNR{out[FNR]=$1 OFS $2 OFS $3 OFS $4 OFS $5} > {out[FNR]=out[FNR] OFS $7 OFS $9} > END {for(i=1;i<=FNR;i++) print out[i]}' file1 file2 file3 ... file100 > > set OFS to a different value (eg, awk -v OFS=',' etc. for a comma) if you > need a different output separator. Hi PK, When I tried the above cmd, I got the syntax error at $9}. I have just tried with 3 files to check the final output. awk 'NR==FNR{ out[FNR] = $1 OFS $2 OFS $3 OFS $4 OFS $5} {out[FNR = out[FNR] OFS $7 OFS $9} END {for(i=1;i<=FNR;i++) print out[i]}' 1.txt 2.txt 3.txt Thanks again, Ezhil
From: Ed Morton on 24 Feb 2010 08:50
On 2/24/2010 7:28 AM, ezhil wrote: > On Feb 24, 12:57 pm, pk<p...(a)pk.invalid> wrote: >> ezhil wrote: >>> My program creates 100 output files each having 9 columns and 150 >>> rows. The first 5 (out of 9) columns is the same in all 100 files. I >>> am trying to parse all the files so that my final file will have the >>> first 5 columns and 2 columns (7,9) from each file. Is there an >>> elegant way of doing this? >> >> So your output should be >> >> c1 c2 c3 c4 c5 c7f1 c9f1 c7f2 c9f2 ... c7f100 c9f100 >> ^^^^^^^^^^^^^^ ^^^^^^^^^ ^^^^^^^^^ ^^^^^^^^^^^^^ >> common file1 file2 ... file100 >> >> If that is what you want, try this: >> >> awk 'NR==FNR{out[FNR]=$1 OFS $2 OFS $3 OFS $4 OFS $5} >> {out[FNR]=out[FNR] OFS $7 OFS $9} >> END {for(i=1;i<=FNR;i++) print out[i]}' file1 file2 file3 ... file100 >> >> set OFS to a different value (eg, awk -v OFS=',' etc. for a comma) if you >> need a different output separator. > > Hi PK, > > When I tried the above cmd, I got the syntax error at $9}. I have > just tried with 3 files to check the final output. > > awk 'NR==FNR{ out[FNR] = $1 OFS $2 OFS $3 OFS $4 OFS $5} {out[FNR = > out[FNR] OFS $7 OFS $9} END {for(i=1;i<=FNR;i++) print out[i]}' 1.txt > 2.txt 3.txt Instead of copy/pasting the script from your newsreader, you tried to re-type it and missed a closing "]" at "out[FNR =...". Ed. |