Prev: Clear All
Next: MACROS IN SAS.
From: Sri on 21 Apr 2010 15:39 Hello All, I'm trying to create single observation from multiple observations and i'm lost, any help on this is highly appreciated. This is my sample data: data test; input case $ claim $; cards; 1000 0238755656 1000 5464313565 1000 8536139709 1001 1111222233 ; For a "case", i have multiple claims. I want to create a single record for each case by creating separate columns for claim (based upon number of claims i have, here 3). My output should look some thing like this: case claim1 claim2 claim3 1000 0238755656 5464313565 8536139709 1001 1111222233 Thank you. Sri
From: rogernomics on 21 Apr 2010 17:28 On Apr 22, 7:39 am, Sri <subhadra...(a)gmail.com> wrote: > Hello All, > I'm trying to create single observation from multiple observations and > i'm lost, any help on this is highly appreciated. > > This is my sample data: > > data test; > input case $ claim $; > cards; > 1000 0238755656 > 1000 5464313565 > 1000 8536139709 > 1001 1111222233 > ; > > For a "case", i have multiple claims. I want to create a single > record for each case by creating separate columns for claim (based > upon number of claims i have, here 3). My output should look some > thing like this: > > case claim1 claim2 claim3 > 1000 0238755656 5464313565 8536139709 > 1001 1111222233 > > Thank you. > Sri Hi Sri, I've tried some code but it needs some trial and error by you to get it right. Here's my code; proc sort data=test; by case claim; run; proc transpose data=test out=test1; by case; id claim; var claim; drop _name_; run; You'll get output that's close to what you wan but not quite right, so experiment with it. All the best dudley
From: Sri on 24 Apr 2010 09:40 On Apr 23, 8:52 am, "data _null_;" <datan...(a)gmail.com> wrote: > On Apr 22, 4:21 pm, rogernomics <newpu...(a)gmail.com> wrote: > > > > > > > On Apr 23, 8:42 am, Sri <subhadra...(a)gmail.com> wrote: > > > > On Apr 22, 7:23 am, Sri <subhadra...(a)gmail.com> wrote: > > > > > On Apr 22, 5:03 am, Patrick <patrick.mat...(a)gmx.ch> wrote: > > > > > > I believe this is what you're after. > > > > > > data test; > > > > > input case $ claim $; > > > > > cards; > > > > > 1000 0238755656 > > > > > 1000 5464313565 > > > > > 1000 8536139709 > > > > > 1001 1111222233 > > > > > ; > > > > > run; > > > > > > /* data must be sorted by case */ > > > > > proc transpose data=test out=test1(drop=_name_) PREFIX=claim; > > > > > by case; > > > > > var claim; > > > > > run; > > > > > > HTH > > > > > Patrick > > > > > Thanks a lot everyone, proc transpose worked for me.- Hide quoted text - > > > > > - Show quoted text - > > > > Is there a way to do the same with "retain" statement in datastep? I > > > have over 1 million records and proc transpose is not efficient as it > > > is taking up all the available memory and not doing the job > > > completely. I appreciate any reponses. Thank you.- Hide quoted text - > > > > - Show quoted text - > > > Hi Sri, > > To be honest, I had to do a similar thing to you but with 19 million > > obs! And proc transpose was the only solution we could come up with. I > > had to get a new computer etc with more memory and dual core chip etc > > just to run it. It was not a perfect solution (as you've found out). > > In the end, you have to use a "cribbing" type method, i.e. extract > > parts that you need bit by bit (maybe write a macro for it, since it > > will repeat) and then merge your extracts back together to get what > > you want. This is what I ended up doing. > > > Bye for now > > dudley- Hide quoted text - > > > - Show quoted text - > > Looking at your other post I think your problem was using the ID > statement. Surely you don't want a separate variable for each CLAIM > number. Aren't the claim numbers unique. The OP indicated he wanted > a simple enumerated variable list, using ID does not provide that. > > Here is a simulation using 1M IDs with up to 30 claims. You can see > that while PROC TRANPOSE is slower than the data step it has a > distinct advantage. You don't need to know the maximum number of > claims per ID. > > 3721 data test; > 3722 do case = 1 to 1000000; > 3723 do _n_ = 1 to ceil(ranuni(12345)*30); > 3724 length claim $10; > 3725 claim = put(ranuni(0),best10.); > 3726 output; > 3727 end; > 3728 end; > 3729 run; > > NOTE: The data set WORK.TEST has 15497660 observations and 2 > variables. > NOTE: DATA statement used (Total process time): > real time 21.54 seconds > user cpu time 16.74 seconds > system cpu time 0.84 seconds > Memory 152k > > 3730 > 3731 data test2(drop=tclaim); > 3732 do _n_ = 1 by 1 until(last.case); > 3733 set test(rename=(claim=tclaim)); > 3734 by case; > 3735 array claim[30] $10; > 3736 claim[_n_]=tclaim ; > 3737 end; > 3738 run; > > NOTE: There were 15497660 observations read from the data set > WORK.TEST. > NOTE: The data set WORK.TEST2 has 1000000 observations and 31 > variables. > NOTE: DATA statement used (Total process time): > real time 17.70 seconds > user cpu time 4.83 seconds > system cpu time 0.92 seconds > Memory 222k > > 3739 > 3740 > 3741 proc transpose data=test out=test3 prefix=claim; > 3742 by case; > 3743 var claim; > 3744 run; > > NOTE: There were 15497660 observations read from the data set > WORK.TEST. > NOTE: The data set WORK.TEST3 has 1000000 observations and 32 > variables. > NOTE: PROCEDURE TRANSPOSE used (Total process time): > real time 22.05 seconds > user cpu time 7.97 seconds > system cpu time 1.20 seconds > Memory 2155k Many thanks to all who responded.
|
Pages: 1 Prev: Clear All Next: MACROS IN SAS. |