From: marty.mcgowan on 3 Apr 2010 20:28 On Apr 3, 1:11 pm, Eric <e...(a)deptj.eu> wrote: > On 2010-04-03, Ed Morton <mortons...(a)gmail.com> wrote: > > > > > On 4/2/2010 9:25 AM, Thomas 'PointedEars' Lahn wrote: > >> Hongyi Zhao wrote: > >>I use the following code to obtain the lines existing file2 but not in file1, > > >>> Ed Morton wrote: > > >>>> awk 'NR==FNR{a[$0]++;next} !a[$0]' file1 file2> tmp&& mv tmp file2 > ><snip> > > >> I would use diff | grep anyway. RTFM. > > > I'm curious - what would that solution look like given the input files below? > > > $ cat file1 > > a > > c > > $ cat file2 > > c > > a > > b > > $ awk 'NR==FNR{a[$0]++;next} !a[$0]' file1 file2 > > b > > > Regards, > > > Ed. > > I've just reread the original question, and I guess it depends on why > you are doing it, but I would definitely consider > > sort file1 > file1s > sort file2 > file2s > comm -13 file1s file2s > > comm needs the files to be sorted, but maybe they are anyway. > > Eric sometimes we can afford to "overwrite" the initial file. see below for source overwrite file1 sort file1 overwrite file2 sort file2 ----------- this is from Kernighan and Pike's "Unix Programming Environment" '84 ============================= # overwrite: copy standard input to output after EOF opath=$PATH PATH=/bin:/usr/bin case $# in 0|1) echo 'Usage: overwrite file cmd [args]' 1>&2; exit 2 esac file=$1; shift new=/tmp/overwr1.$$; old=/tmp/overwr2.$$ trap 'rm -f $new $old; exit 1' 1 2 15 # clean up if PATH=$opath "$@" >$new then cp $file $old # save original trap '' 1 2 15 # wr are commmitted cp $new $file else echo "overwrite: $1 failed, $file unchanged" 1>&2 exit 1 fi rm -f $new $old ============================= enjoy -=+-- Marty
From: Ed Morton on 4 Apr 2010 08:40 On 4/3/2010 12:11 PM, Eric wrote: > On 2010-04-03, Ed Morton<mortonspam(a)gmail.com> wrote: >> On 4/2/2010 9:25 AM, Thomas 'PointedEars' Lahn wrote: >>> Hongyi Zhao wrote: >>> I use the following code to obtain the lines existing file2 but not in file1, >>> >>>> Ed Morton wrote: >>>>> >>>>> awk 'NR==FNR{a[$0]++;next} !a[$0]' file1 file2> tmp&& mv tmp file2 >> <snip> >>> >>> I would use diff | grep anyway. RTFM. >>> >> >> I'm curious - what would that solution look like given the input files below? >> >> $ cat file1 >> a >> c >> $ cat file2 >> c >> a >> b >> $ awk 'NR==FNR{a[$0]++;next} !a[$0]' file1 file2 >> b >> >> Regards, >> >> Ed. > > I've just reread the original question, and I guess it depends on why > you are doing it, but I would definitely consider > > sort file1> file1s > sort file2> file2s > comm -13 file1s file2s > > comm needs the files to be sorted, but maybe they are anyway. I'd consider that too. I'd still like to see what the diff | grep solution looks like though as the way I THINK you'd have to implement that would be fairly unpleasant but I might be overlooking a clean approach. Ed.
From: Thomas 'PointedEars' Lahn on 4 Apr 2010 19:39 Ed Morton wrote: [Quotation fixed] > Thomas 'PointedEars' Lahn wrote: >>> Ed Morton wrote: >>>>> Hongyi Zhao wrote: >>>>> I use the following code to obtain the lines existing file2 but >>>>> not in file1, >>>> [...] >>>> awk 'NR==FNR{a[$0]++;next} !a[$0]' file1 file2 > tmp && mv tmp file2 >> [...] >> I would use diff | grep anyway. RTFM. > > I'm curious - what would that solution look like given the input files > below? > > $ cat file1 > a > c > $ cat file2 > c > a > b > $ awk 'NR==FNR{a[$0]++;next} !a[$0]' file1 file2 > b I did not understand the OP that they meant any line (without regard to order) and did not run the awk-based proposal, so I was thinking of diff -u --suppress-common-lines file1 file2 | grep ^+ | sed '1d; s/^+//' where grep(1) would be superflous, indeed: diff -u --suppress-common-lines file1 file2 | sed -n '2d; /^+/ s/^+// p' However, if order does not matter, this can be modified in bash(1) to diff -u --suppress-common-lines <(sort file1) <(sort file2) | sed -n '2d; /^+/ s/^+// p' (Remove --suppress-common-lines for non-GNU diffs, respectively.) It is quite possible that I have overlooked a solution that uses only non-POSIX diff(1). PointedEars
From: Ed Morton on 6 Apr 2010 14:42 On 4/4/2010 6:39 PM, Thomas 'PointedEars' Lahn wrote: > Ed Morton wrote: > > [Quotation fixed] > >> Thomas 'PointedEars' Lahn wrote: >>>> Ed Morton wrote: >>>>>> Hongyi Zhao wrote: >>>>>> I use the following code to obtain the lines existing file2 but >>>>>> not in file1, >>>>> [...] >>>>> awk 'NR==FNR{a[$0]++;next} !a[$0]' file1 file2> tmp&& mv tmp file2 >>> [...] >>> I would use diff | grep anyway. RTFM. >> >> I'm curious - what would that solution look like given the input files >> below? >> >> $ cat file1 >> a >> c >> $ cat file2 >> c >> a >> b >> $ awk 'NR==FNR{a[$0]++;next} !a[$0]' file1 file2 >> b > > I did not understand the OP that they meant any line (without regard to > order) and did not run the awk-based proposal, so I was thinking of > > diff -u --suppress-common-lines file1 file2 | grep ^+ | sed '1d; s/^+//' > > where grep(1) would be superflous, indeed: > > diff -u --suppress-common-lines file1 file2 | > sed -n '2d; /^+/ s/^+// p' > > However, if order does not matter, this can be modified in bash(1) to > > diff -u --suppress-common-lines<(sort file1)<(sort file2) | > sed -n '2d; /^+/ s/^+// p' > > (Remove --suppress-common-lines for non-GNU diffs, respectively.) > > It is quite possible that I have overlooked a solution that uses only > non-POSIX diff(1). --suppress-common-lines apparently already relies on using non-POSIX diff (see the POSIX diff man page at http://www.opengroup.org/onlinepubs/009695399/utilities/diff.html). Maybe it's GNU diff? In any case, I've never heard of that option before so thanks for the tip. Ed.
From: Thomas 'PointedEars' Lahn on 6 Apr 2010 15:29 Ed Morton wrote: > Thomas 'PointedEars' Lahn wrote: >> Ed Morton wrote: >>> Thomas 'PointedEars' Lahn wrote: >>>>> Ed Morton wrote: >>>>>>> Hongyi Zhao wrote: >>>>>>> I use the following code to obtain the lines existing file2 but >>>>>>> not in file1, >>>>>> [...] >>>>>> awk 'NR==FNR{a[$0]++;next} !a[$0]' file1 file2> tmp&& mv tmp file2 >>>> [...] >>>> I would use diff | grep anyway. RTFM. >>> >>> I'm curious - what would that solution look like given the input files >>> below? >>> >>> $ cat file1 >>> a >>> c >>> $ cat file2 >>> c >>> a >>> b >>> $ awk 'NR==FNR{a[$0]++;next} !a[$0]' file1 file2 >>> b >> >> I did not understand the OP that they meant any line (without regard to >> order) and did not run the awk-based proposal, so I was thinking of >> >> diff -u --suppress-common-lines file1 file2 | grep ^+ | sed '1d; >> s/^+//' >> >> where grep(1) would be superflous, indeed: >> >> diff -u --suppress-common-lines file1 file2 | >> sed -n '2d; /^+/ s/^+// p' >> >> However, if order does not matter, this can be modified in bash(1) to >> >> diff -u --suppress-common-lines<(sort file1)<(sort file2) | >> sed -n '2d; /^+/ s/^+// p' >> >> (Remove --suppress-common-lines for non-GNU diffs, respectively.) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >> It is quite possible that I have overlooked a solution that uses only >> non-POSIX diff(1). > > --suppress-common-lines apparently already relies on using non-POSIX diff It is not a requirement to use that option for the solution to work, but it makes things easier for sed(1). > (see the POSIX diff man page at > http://www.opengroup.org/onlinepubs/009695399/utilities/diff.html). Maybe > it's GNU diff? See above. > In any case, I've never heard of that option before so thanks for the > tip. You're welcome. PointedEars
First
|
Prev
|
Pages: 1 2 3 4 Prev: AWK - Stop processing Next: bash: how to transfer some non-ascii code |