From: Lipty on 24 Apr 2010 19:27 Dear all, I am sorry to bother you guys. To do categorical data analysis, I have to rearrange my raw data set. I have tried many methods, but was stuck here. My raw data set is as follows: Name Good P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 Good, P1, P2, P3, P4, P5, P6, P7, P8, P9 and P10 are binary variables. If the product has the feature, that dummy variable takes 1, otherwise 0. I would like to change the raw data into the following data set Good P Yes Count 1 P1 1 200 1 P1 0 600 0 P1 1 200 0 P1 0 100 1 P10 1 200 1 P10 0 100 0 P10 1 60 0 P10 0 70 Good has the same meaning as my raw data. P means P1 through P10. Variable YES is related to P. Eg. if one product has P1 feature, YES is 1; if it also has P3 feature, YES is 1. Count indicates the number of ovservations which are in this category. Any suggestion will be highly appriciated. Thanks,
From: Patrick on 24 Apr 2010 20:58 Hi Providing some sample data would have been helpful. I've created them now by myself (data have). I believe PROC TRANSPOSE does all you need. Not sure if you need this "count" variable at all as most SAS Procs work great with detail data. data have(drop=i); array p {10} 8; do good=1 to 5; do i=1 to dim (p); p{i}=floor(0.5+ranuni(1)); end; id+1; output; if ranuni(1)>0.5 then do; id+1; output; end; end; run; proc transpose data=have prefix=yes out=want(rename=(yes1=yes _name_=p)); by id good; var p1-p10; run; proc sql; /* create table AnalysisData as*/ select good,p label='p',yes,count(*) as count from want group by good,p,yes; ; quit; HTH Patrick
From: Lipty on 24 Apr 2010 23:43 Thank you very much. Since I am using logistics model, count is needed. I will try your method first. I really appreciate your help
From: Lipty on 25 Apr 2010 16:48 It doesn't work. My data set is as follows: data test; input Good P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16; datalines; 1 1 1 1 . 1 0 1 1 1 1 1 1 0 0 0 1 1 1 1 0 . 1 1 1 1 1 0 1 0 0 1 1 1 1 1 1 1 0 1 1 0 0 1 0 1 1 0 0 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 0 1 1 1 0 0 1 0 0 0 0 0 1 0 0 1 0 0 . . . . . 0 . . . . . . . . . . 1 1 1 1 1 1 1 0 1 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 1 1 0 1 0 1 0 1 1 1 1 1 1 0 0 0 0 1 1 0 0 1 1 ; I would like to change the raw data into the following data set Good P(i) Yes Count 1 P1 1 200 1 P1 0 600 0 P1 1 200 0 P1 0 100 1 P10 1 200 1 P10 0 100 0 P10 1 60 0 P10 0 70 1 P16 1 300 1 P16 0 150 0 P16 1 50 0 P16 0 75 Good has the same meaning as my raw data. P(i) means one of P1 through P10. Variable YES is RELATED to P(i). Count indicates the number of ovservations which are in this category. For example, Good P(i) Yes Count 1 P1 1 200 means there are 200 products are good and have P1 feature. Good P(i) Yes Count 0 P12 1 300 means there are 300 products are not good, but have P12 feature. etc. Thank you very much for any suggestion.
From: Barry Schwarz on 25 Apr 2010 17:46
What doesn't work. You didn't quote any of the message you are responding to so we don't know what you tried. "Doesn't work" isn't much of a problem description. How does it not work? No output? Insufficient output? Excessive output? Wrong output? Have you tried to debug whatever code you are running? Added put statements at strategic places to determine whether if statements are evaluating the way you expect? Since your sample output is only casually related to your sample input, you need to provide a complete description of the "rules." For example, do you want missing data treated as 0 or 1 or ignored? On Sun, 25 Apr 2010 13:48:50 -0700 (PDT), Lipty <lipty.li(a)gmail.com> wrote: >It doesn't work. My data set is as follows: > >data test; > >input Good P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16; >datalines; > >1 1 1 1 . 1 0 1 1 1 1 >1 1 0 0 0 1 >1 1 1 0 . 1 1 1 1 1 0 >1 0 0 1 1 1 >1 1 1 1 0 1 1 0 0 1 0 >1 1 0 0 1 1 >1 1 1 1 1 1 1 0 1 1 0 >1 1 1 1 1 1 >0 1 1 1 0 0 1 0 0 0 0 >0 1 0 0 1 0 >0 . . . . . >0 . . . . . . . . . . >1 1 1 1 1 1 1 0 1 0 0 >1 1 1 1 1 1 >0 0 1 1 1 1 1 1 1 1 0 >1 1 0 1 0 1 >0 1 1 1 1 1 1 0 0 0 0 >1 1 0 0 1 1 >; > > >I would like to change the raw data into the following data set > > >Good P(i) Yes Count >1 P1 1 200 >1 P1 0 600 >0 P1 1 200 >0 P1 0 100 > > >�� >�� >1 P10 1 200 >1 P10 0 100 >0 P10 1 60 >0 P10 0 70 >�� >�� >1 P16 1 300 >1 P16 0 150 >0 P16 1 50 >0 P16 0 75 > >Good has the same meaning as my raw data. P(i) means one of P1 through >P10. >Variable YES is RELATED to P(i). Count indicates the number >of ovservations which are in this category. > >For example, >Good P(i) Yes Count >1 P1 1 200 >means there are 200 products are good and have P1 feature. > >Good P(i) Yes Count >0 P12 1 300 >means there are 300 products are not good, but have P12 feature. > >etc. > >Thank you very much for any suggestion. -- Remove del for email |