From: Pankaj on 13 Feb 2010 09:00 Greetings, I have a file with following contents File1.txt <abc>CONTENT1-GOES-HERE</abc> <abc>CONTENT2-GOES-HERE</abc> File2.txt <abc>CONTENT1-GOES-HERE</abc> <abc>CONTENT2-GOES-HERE</abc> <abc>CONTENT3-GOES-HERE</abc> TO explain, the one record is identified by the starting of <abc> and ending with </abc>. and after each </abc>, the next record is starting with a new line. I want to compare the above two files (not just content but the whole record starting with <abc> and ending with </abc>). I want all data present in file2.txt but not in File1.txt, So, in above sample data the output (say in a 3rd file is), Final.output <abc>CONTENT3-GOES-HERE</abc> I have tried cat File1.txt | sort > File11.txt cat File2.txt | sort > File22.txt comm -23 File22.txt File11.txt > Final.output The Final.output file does not show correct record (It was showing more records then expected). I am not sure if the above is the correct way of going about it? Any help would be appreciated. We are using Solaris 5.10 TIA
From: pk on 13 Feb 2010 08:58 Pankaj wrote: > File1.txt > > <abc>CONTENT1-GOES-HERE</abc> > <abc>CONTENT2-GOES-HERE</abc> > > File2.txt > > <abc>CONTENT1-GOES-HERE</abc> > <abc>CONTENT2-GOES-HERE</abc> > <abc>CONTENT3-GOES-HERE</abc> > > TO explain, the one record is identified by the starting of <abc> and > ending with </abc>. and after each </abc>, the next record is starting > with a new line. > > I want to compare the above two files (not just content but the whole > record starting with <abc> and ending with </abc>). I want all data > present in file2.txt but not in File1.txt, > > So, in above sample data the output (say in a 3rd file is), This should do that using awk: awk 'NR==FNR{a[$0];next} !($0 in a)' file1.xml file2.xml > We are using Solaris 5.10 Then use /usr/xpg4/bin/awk.
From: Pankaj on 13 Feb 2010 09:36 On Feb 13, 7:58 am, pk <p...(a)pk.invalid> wrote: > Pankaj wrote: > > File1.txt > > > <abc>CONTENT1-GOES-HERE</abc> > > <abc>CONTENT2-GOES-HERE</abc> > > > File2.txt > > > <abc>CONTENT1-GOES-HERE</abc> > > <abc>CONTENT2-GOES-HERE</abc> > > <abc>CONTENT3-GOES-HERE</abc> > > > TO explain, the one record is identified by the starting of <abc> and > > ending with </abc>. and after each </abc>, the next record is starting > > with a new line. > > > I want to compare the above two files (not just content but the whole > > record starting with <abc> and ending with </abc>). I want all data > > present in file2.txt but not in File1.txt, > > > So, in above sample data the output (say in a 3rd file is), > > This should do that using awk: > > awk 'NR==FNR{a[$0];next} !($0 in a)' file1.xml file2.xml > > > We are using Solaris 5.10 > > Then use /usr/xpg4/bin/awk.- Hide quoted text - > > - Show quoted text - That works like a charm Pk. Can you please explain the code-flow?
From: pk on 13 Feb 2010 10:12 Pankaj wrote: >> This should do that using awk: >> >> awk 'NR==FNR{a[$0];next} !($0 in a)' file1.xml file2.xml >> >> > We are using Solaris 5.10 >> >> Then use /usr/xpg4/bin/awk.- Hide quoted text - >> >> - Show quoted text - > > That works like a charm Pk. Can you please explain the code-flow? NR==FNR{a[$0];next} This reads all the first file's lines as indexes of the "a" associative array (or hash). NR==FNR means "while we're reading the first file". $0 represents the input line, so a[$0] creates the element subscripted by $0 in the hash. ---------------- !($0 in a) This is evaluated when the second file is being read, and essentially tells awk "if the line we're reading ($0) is NOT present as an index of the hash a (this is indicated by !($0 in a), then print it". If's probably clearer writtin as follows: !($0 in a) {print $0} but the two forms are equivalent, since when awk finds a true condition, by default it prints the record (line in this case).
From: Pankaj on 13 Feb 2010 13:06 On Feb 13, 9:12 am, pk <p...(a)pk.invalid> wrote: > Pankaj wrote: > >> This should do that using awk: > > >> awk 'NR==FNR{a[$0];next} !($0 in a)' file1.xml file2.xml > > >> > We are using Solaris 5.10 > > >> Then use /usr/xpg4/bin/awk.- Hide quoted text - > > >> - Show quoted text - > > > That works like a charm Pk. Can you please explain the code-flow? > > NR==FNR{a[$0];next} > > This reads all the first file's lines as indexes of the "a" associative > array (or hash). NR==FNR means "while we're reading the first file". $0 > represents the input line, so a[$0] creates the element subscripted by $0 in > the hash. > ---------------- > !($0 in a) > > This is evaluated when the second file is being read, and essentially tells > awk "if the line we're reading ($0) is NOT present as an index of the hash a > (this is indicated by !($0 in a), then print it". > If's probably clearer writtin as follows: > > !($0 in a) {print $0} > > but the two forms are equivalent, since when awk finds a true condition, by > default it prints the record (line in this case). Thanks again Pk. It seems I really need to learn AWK programming. Appreciate your time.
|
Pages: 1 Prev: rsync performance Next: problem using script creating users from csv |