Duplicates [SAS]

Prev: Datasetp Retain Question
Next: Outputting in SQL

From: Ilan Benamara on 11 Mar 2010 17:25

Hello to all,

This is my first time posting.

My dataset has multiple lines per observation a and ounce i deleted
duplicate lines(for ALL variables) i would to add a colomn to each
observation saying how many variables were non-equal hence not
excluded by the NODUP option. Here's an example of this scenario:

id var1 var2 var3 var4 NumberDiff
1 same Diff same 1 --->Only var2 had
same value for observation1 on two lines
1___________________________
2 Diff Diff 2 --->Only var3
and 4 had same value for observation2 on two lines
2

I hope it is clear,let me know.

Thanks to all in advance

From: Sierra Information Services on 11 Mar 2010 18:04

Of the top of my (rapidly greying) head, I don't think you'll get what
you want using PROC SORT.

If you use PROC FREQ with an OUT= option, you can create a data set
that has one line per unique observation value and a count of how many
observations have that value.

Here's an example:

/* start */
data mydata;
do obs = 1 to 20;
if 1 <= obs <= 3 then category = 'A';
else
if 4 <= obs <=4 then category = 'B';
else
if 5 <= obs <= 12 then category = 'C';
else
if 13 <= obs <= 15 then category = 'D';
else category = 'E';
output;
end;
run;

proc freq data=mydata;
tables category/ noprint out=new(drop=percent);
title 'count number of times each valye of category occurs in data set
mydata';
run;

options nodate nonumber nocenter;
proc print data=new;
title 'count number of times each value of category occurs in data set
mydata';
run;

/* end */

Temporary data set new may have what you want. The variable COUNT,
automatically created by PROC FREQ, shows how many times a particular
value of CATEGORY is found in data set MYDATA.

I hope this helps.

Andrew Karp
Sierra Information Services
http://www.sierrainformation.com

On Mar 11, 2:25ï¿½pm, Ilan Benamara <ilan.benam...(a)gmail.com> wrote:
> Hello to all,
>
> This is my first time posting.
>
> My dataset has multiple lines per observation a and ounce i deleted
> duplicate lines(for ALL variables) i would to add a colomn to each
> observation saying how many variables were non-equal hence not
> excluded by the NODUP option. Here's an example of this scenario:
>
> id ï¿½var1 ï¿½ ï¿½ var2 ï¿½ var3 ï¿½var4 NumberDiff
> 1 ï¿½same ï¿½ ï¿½Diff ï¿½ ï¿½ same ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ 1 ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½--->Only var2 had
> same value for observation1 on two lines
> 1___________________________
> 2 ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½Diff ï¿½Diff ï¿½ ï¿½ ï¿½ ï¿½ ï¿½2 ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ --->Only var3
> and 4 had same value for observation2 on two lines
> 2
>
> I hope it is clear,let me know.
>
> Thanks to all in advance

From: Sierra Information Services on 11 Mar 2010 18:06

Well, I just re-read your post and am not sure if what I am offering
is what you really need. Have you looked at PROC COMPARE, perhaps?

Sorry!

Andrew

On Mar 11, 3:04ï¿½pm, Sierra Information Services <sfbay0...(a)aol.com>
wrote:
> Of the top of my (rapidly greying) head, I don't think you'll get what
> you want using PROC SORT.
>
> If you use PROC FREQ with an OUT= option, you can create a data set
> that has one line per unique observation value and a count of how many
> observations have that value.
>
> Here's an example:
>
> /* start */
> data mydata;
> ï¿½ do obs = ï¿½1 to 20;
> ï¿½ ï¿½ ï¿½if 1 <= obs <= 3 then category = 'A';
> ï¿½ ï¿½ ï¿½ ï¿½ ï¿½else
> ï¿½ ï¿½ ï¿½ ï¿½ ï¿½if 4 <= obs <=4 then category = 'B';
> ï¿½ ï¿½ ï¿½ ï¿½ ï¿½else
> ï¿½ ï¿½ ï¿½ ï¿½ ï¿½if 5 <= obs <= 12 then category = 'C';
> ï¿½ ï¿½ ï¿½ ï¿½ ï¿½else
> ï¿½ ï¿½ ï¿½ ï¿½ ï¿½if 13 <= obs <= 15 then category = 'D';
> ï¿½ ï¿½ ï¿½ ï¿½ ï¿½else category = 'E';
> ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ output;
> ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½end;
> ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½run;
>
> proc freq data=mydata;
> tables category/ noprint out=new(drop=percent);
> title 'count number of times each valye of category occurs in data set
> mydata';
> run;
>
> options nodate nonumber nocenter;
> proc print data=new;
> title 'count number of times each value of category occurs in data set
> mydata';
> run;
>
> /* end */
>
> Temporary data set new may have what you want. ï¿½The variable COUNT,
> automatically created by PROC FREQ, shows how many times a particular
> value of CATEGORY is found in data set MYDATA.
>
> I hope this helps.
>
> Andrew Karp
> Sierra Information Serviceshttp://www.sierrainformation.com
>
> On Mar 11, 2:25 pm, Ilan Benamara <ilan.benam...(a)gmail.com> wrote:
>
>
>
> > Hello to all,
>
> > This is my first time posting.
>
> > My dataset has multiple lines per observation a and ounce i deleted
> > duplicate lines(for ALL variables) i would to add a colomn to each
> > observation saying how many variables were non-equal hence not
> > excluded by the NODUP option. Here's an example of this scenario:
>
> > id var1 var2 var3 var4 NumberDiff
> > 1 same Diff same 1 --->Only var2 had
> > same value for observation1 on two lines
> > 1___________________________
> > 2 Diff Diff 2 --->Only var3
> > and 4 had same value for observation2 on two lines
> > 2
>
> > I hope it is clear,let me know.
>
> > Thanks to all in advance- Hide quoted text -
>
> - Show quoted text -

|
Pages: 1
Prev: Datasetp Retain Question
Next: Outputting in SQL