Help on Creating single observation from multiple observations [SAS]

From: Sri on 21 Apr 2010 15:39

Hello All,
I'm trying to create single observation from multiple observations and
i'm lost, any help on this is highly appreciated.

This is my sample data:

data test;
input case $ claim $;
cards;
1000 0238755656
1000 5464313565
1000 8536139709
1001 1111222233
;

For a "case", i have multiple claims. I want to create a single
record for each case by creating separate columns for claim (based
upon number of claims i have, here 3). My output should look some
thing like this:

case claim1 claim2 claim3
1000 0238755656 5464313565 8536139709
1001 1111222233

Thank you.
Sri

From: rogernomics on 21 Apr 2010 17:28

On Apr 22, 7:39 am, Sri <subhadra...(a)gmail.com> wrote:
> Hello All,
> I'm trying to create single observation from multiple observations and
> i'm lost, any help on this is highly appreciated.
>
> This is my sample data:
>
> data test;
> input case $ claim $;
> cards;
> 1000 0238755656
> 1000 5464313565
> 1000 8536139709
> 1001 1111222233
> ;
>
> For a "case", i have multiple claims. I want to create a single
> record for each case by creating separate columns for claim (based
> upon number of claims i have, here 3). My output should look some
> thing like this:
>
> case claim1 claim2 claim3
> 1000 0238755656 5464313565 8536139709
> 1001 1111222233
>
> Thank you.
> Sri

Hi Sri,
I've tried some code but it needs some trial and error by you to get
it right. Here's my code;

proc sort data=test;
by case claim;
run;

proc transpose data=test out=test1;
by case;
id claim;
var claim;
drop _name_;
run;

You'll get output that's close to what you wan but not quite right, so
experiment with it.

All the best
dudley

From: Sri on 24 Apr 2010 09:40

On Apr 23, 8:52 am, "data _null_;" <datan...(a)gmail.com> wrote:
> On Apr 22, 4:21 pm, rogernomics <newpu...(a)gmail.com> wrote:
>
>
>
>
>
> > On Apr 23, 8:42 am, Sri <subhadra...(a)gmail.com> wrote:
>
> > > On Apr 22, 7:23 am, Sri <subhadra...(a)gmail.com> wrote:
>
> > > > On Apr 22, 5:03 am, Patrick <patrick.mat...(a)gmx.ch> wrote:
>
> > > > > I believe this is what you're after.
>
> > > > > data test;
> > > > > input case $ claim $;
> > > > > cards;
> > > > > 1000 0238755656
> > > > > 1000 5464313565
> > > > > 1000 8536139709
> > > > > 1001 1111222233
> > > > > ;
> > > > > run;
>
> > > > > /* data must be sorted by case */
> > > > > proc transpose data=test out=test1(drop=_name_) PREFIX=claim;
> > > > > by case;
> > > > > var claim;
> > > > > run;
>
> > > > > HTH
> > > > > Patrick
>
> > > > Thanks a lot everyone, proc transpose worked for me.- Hide quoted text -
>
> > > > - Show quoted text -
>
> > > Is there a way to do the same with "retain" statement in datastep? I
> > > have over 1 million records and proc transpose is not efficient as it
> > > is taking up all the available memory and not doing the job
> > > completely. I appreciate any reponses. Thank you.- Hide quoted text -
>
> > > - Show quoted text -
>
> > Hi Sri,
> > To be honest, I had to do a similar thing to you but with 19 million
> > obs! And proc transpose was the only solution we could come up with. I
> > had to get a new computer etc with more memory and dual core chip etc
> > just to run it. It was not a perfect solution (as you've found out).
> > In the end, you have to use a "cribbing" type method, i.e. extract
> > parts that you need bit by bit (maybe write a macro for it, since it
> > will repeat) and then merge your extracts back together to get what
> > you want. This is what I ended up doing.
>
> > Bye for now
> > dudley- Hide quoted text -
>
> > - Show quoted text -
>
> Looking at your other post I think your problem was using the ID
> statement. Surely you don't want a separate variable for each CLAIM
> number. Aren't the claim numbers unique. The OP indicated he wanted
> a simple enumerated variable list, using ID does not provide that.
>
> Here is a simulation using 1M IDs with up to 30 claims. You can see
> that while PROC TRANPOSE is slower than the data step it has a
> distinct advantage. You don't need to know the maximum number of
> claims per ID.
>
> 3721 data test;
> 3722 do case = 1 to 1000000;
> 3723 do _n_ = 1 to ceil(ranuni(12345)*30);
> 3724 length claim $10;
> 3725 claim = put(ranuni(0),best10.);
> 3726 output;
> 3727 end;
> 3728 end;
> 3729 run;
>
> NOTE: The data set WORK.TEST has 15497660 observations and 2
> variables.
> NOTE: DATA statement used (Total process time):
> real time 21.54 seconds
> user cpu time 16.74 seconds
> system cpu time 0.84 seconds
> Memory 152k
>
> 3730
> 3731 data test2(drop=tclaim);
> 3732 do _n_ = 1 by 1 until(last.case);
> 3733 set test(rename=(claim=tclaim));
> 3734 by case;
> 3735 array claim[30] $10;
> 3736 claim[_n_]=tclaim ;
> 3737 end;
> 3738 run;
>
> NOTE: There were 15497660 observations read from the data set
> WORK.TEST.
> NOTE: The data set WORK.TEST2 has 1000000 observations and 31
> variables.
> NOTE: DATA statement used (Total process time):
> real time 17.70 seconds
> user cpu time 4.83 seconds
> system cpu time 0.92 seconds
> Memory 222k
>
> 3739
> 3740
> 3741 proc transpose data=test out=test3 prefix=claim;
> 3742 by case;
> 3743 var claim;
> 3744 run;
>
> NOTE: There were 15497660 observations read from the data set
> WORK.TEST.
> NOTE: The data set WORK.TEST3 has 1000000 observations and 32
> variables.
> NOTE: PROCEDURE TRANSPOSE used (Total process time):
> real time 22.05 seconds
> user cpu time 7.97 seconds
> system cpu time 1.20 seconds
> Memory 2155k

Many thanks to all who responded.

|
Pages: 1
Prev: Clear All
Next: MACROS IN SAS.