modify data set for categorical analysis [SAS]

Prev: Mean(x,y) vs (x+y)/2
Next: Confidence Limit for KS statistic

From: Lipty on 24 Apr 2010 19:27

Dear all,

I am sorry to bother you guys. To do categorical data analysis, I have
to rearrange my raw data set. I have tried many methods, but was stuck
here. My raw data set is as follows:

Name Good P1 P2 P3 P4 P5 P6 P7 P8 P9 P10

Good, P1, P2, P3, P4, P5, P6, P7, P8, P9 and P10 are binary variables.
If the product has the feature, that dummy variable takes 1, otherwise
0. I would like to change the raw data into the following data set

Good P Yes Count
1 P1 1 200
1 P1 0 600
0 P1 1 200
0 P1 0 100

1 P10 1 200
1 P10 0 100
0 P10 1 60
0 P10 0 70

Good has the same meaning as my raw data. P means P1 through P10.
Variable YES is related to P. Eg. if one product has P1 feature, YES
is 1; if it also has P3 feature, YES is 1. Count indicates the number
of ovservations which are in this category.

Any suggestion will be highly appriciated.

Thanks,

From: Patrick on 24 Apr 2010 20:58

Hi

Providing some sample data would have been helpful. I've created them
now by myself (data have).

I believe PROC TRANSPOSE does all you need.

Not sure if you need this "count" variable at all as most SAS Procs
work great with detail data.

data have(drop=i);
array p {10} 8;
do good=1 to 5;
do i=1 to dim (p);
p{i}=floor(0.5+ranuni(1));
end;
id+1;
output;
if ranuni(1)>0.5 then
do;
id+1;
output;
end;
end;
run;

proc transpose data=have prefix=yes out=want(rename=(yes1=yes
_name_=p));
by id good;
var p1-p10;
run;

proc sql;
/* create table AnalysisData as*/
select good,p label='p',yes,count(*) as count
from want
group by good,p,yes;
;
quit;

HTH
Patrick

From: Lipty on 24 Apr 2010 23:43

Thank you very much. Since I am using logistics model, count is
needed. I will try your method first.

I really appreciate your help

From: Lipty on 25 Apr 2010 16:48

It doesn't work. My data set is as follows:

data test;

input Good P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16;
datalines;

1 1 1 1 . 1 0 1 1 1 1
1 1 0 0 0 1
1 1 1 0 . 1 1 1 1 1 0
1 0 0 1 1 1
1 1 1 1 0 1 1 0 0 1 0
1 1 0 0 1 1
1 1 1 1 1 1 1 0 1 1 0
1 1 1 1 1 1
0 1 1 1 0 0 1 0 0 0 0
0 1 0 0 1 0
0 . . . . .
0 . . . . . . . . . .
1 1 1 1 1 1 1 0 1 0 0
1 1 1 1 1 1
0 0 1 1 1 1 1 1 1 1 0
1 1 0 1 0 1
0 1 1 1 1 1 1 0 0 0 0
1 1 0 0 1 1
;

I would like to change the raw data into the following data set

Good P(i) Yes Count
1 P1 1 200
1 P1 0 600
0 P1 1 200
0 P1 0 100

1 P10 1 200
1 P10 0 100
0 P10 1 60
0 P10 0 70

1 P16 1 300
1 P16 0 150
0 P16 1 50
0 P16 0 75

Good has the same meaning as my raw data. P(i) means one of P1 through
P10.
Variable YES is RELATED to P(i). Count indicates the number
of ovservations which are in this category.

For example,
Good P(i) Yes Count
1 P1 1 200
means there are 200 products are good and have P1 feature.

Good P(i) Yes Count
0 P12 1 300
means there are 300 products are not good, but have P12 feature.

etc.

Thank you very much for any suggestion.

From: Barry Schwarz on 25 Apr 2010 17:46

What doesn't work. You didn't quote any of the message you are
responding to so we don't know what you tried.

"Doesn't work" isn't much of a problem description. How does it not
work? No output? Insufficient output? Excessive output? Wrong
output?

Have you tried to debug whatever code you are running? Added put
statements at strategic places to determine whether if statements are
evaluating the way you expect?

Since your sample output is only casually related to your sample
input, you need to provide a complete description of the "rules." For
example, do you want missing data treated as 0 or 1 or ignored?

On Sun, 25 Apr 2010 13:48:50 -0700 (PDT), Lipty <lipty.li(a)gmail.com>
wrote:

>It doesn't work. My data set is as follows:
>
>data test;
>
>input Good P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16;
>datalines;
>
>1 1 1 1 . 1 0 1 1 1 1
>1 1 0 0 0 1
>1 1 1 0 . 1 1 1 1 1 0
>1 0 0 1 1 1
>1 1 1 1 0 1 1 0 0 1 0
>1 1 0 0 1 1
>1 1 1 1 1 1 1 0 1 1 0
>1 1 1 1 1 1
>0 1 1 1 0 0 1 0 0 0 0
>0 1 0 0 1 0
>0 . . . . .
>0 . . . . . . . . . .
>1 1 1 1 1 1 1 0 1 0 0
>1 1 1 1 1 1
>0 0 1 1 1 1 1 1 1 1 0
>1 1 0 1 0 1
>0 1 1 1 1 1 1 0 0 0 0
>1 1 0 0 1 1
>;
>
>
>I would like to change the raw data into the following data set
>
>
>Good P(i) Yes Count
>1 P1 1 200
>1 P1 0 600
>0 P1 1 200
>0 P1 0 100
>
>
>��
>��
>1 P10 1 200
>1 P10 0 100
>0 P10 1 60
>0 P10 0 70
>��
>��
>1 P16 1 300
>1 P16 0 150
>0 P16 1 50
>0 P16 0 75
>
>Good has the same meaning as my raw data. P(i) means one of P1 through
>P10.
>Variable YES is RELATED to P(i). Count indicates the number
>of ovservations which are in this category.
>
>For example,
>Good P(i) Yes Count
>1 P1 1 200
>means there are 200 products are good and have P1 feature.
>
>Good P(i) Yes Count
>0 P12 1 300
>means there are 300 products are not good, but have P12 feature.
>
>etc.
>
>Thank you very much for any suggestion.

--
Remove del for email

| Next | Last
Pages: 1 2
Prev: Mean(x,y) vs (x+y)/2
Next: Confidence Limit for KS statistic