From: Sri on
Hello All,
I'm trying to create single observation from multiple observations and
i'm lost, any help on this is highly appreciated.

This is my sample data:

data test;
input case $ claim $;
cards;
1000 0238755656
1000 5464313565
1000 8536139709
1001 1111222233
;

For a "case", i have multiple claims. I want to create a single
record for each case by creating separate columns for claim (based
upon number of claims i have, here 3). My output should look some
thing like this:

case claim1 claim2 claim3
1000 0238755656 5464313565 8536139709
1001 1111222233

Thank you.
Sri
From: rogernomics on
On Apr 22, 7:39 am, Sri <subhadra...(a)gmail.com> wrote:
> Hello All,
> I'm trying to create single observation from multiple observations and
> i'm lost, any help on this is highly appreciated.
>
> This is my sample data:
>
> data test;
>  input case $ claim $;
>  cards;
> 1000 0238755656
> 1000 5464313565
> 1000 8536139709
> 1001 1111222233
> ;
>
> For a "case",  i have multiple claims. I want to create a single
> record for each case by creating separate columns for claim (based
> upon number of claims i have, here 3). My output should look some
> thing like this:
>
> case    claim1           claim2         claim3
> 1000 0238755656 5464313565 8536139709
> 1001 1111222233
>
> Thank you.
> Sri


Hi Sri,
I've tried some code but it needs some trial and error by you to get
it right. Here's my code;

proc sort data=test;
by case claim;
run;

proc transpose data=test out=test1;
by case;
id claim;
var claim;
drop _name_;
run;

You'll get output that's close to what you wan but not quite right, so
experiment with it.

All the best
dudley




From: Sri on
On Apr 23, 8:52 am, "data _null_;" <datan...(a)gmail.com> wrote:
> On Apr 22, 4:21 pm, rogernomics <newpu...(a)gmail.com> wrote:
>
>
>
>
>
> > On Apr 23, 8:42 am, Sri <subhadra...(a)gmail.com> wrote:
>
> > > On Apr 22, 7:23 am, Sri <subhadra...(a)gmail.com> wrote:
>
> > > > On Apr 22, 5:03 am, Patrick <patrick.mat...(a)gmx.ch> wrote:
>
> > > > > I believe this is what you're after.
>
> > > > > data test;
> > > > >  input case $ claim $;
> > > > >  cards;
> > > > > 1000 0238755656
> > > > > 1000 5464313565
> > > > > 1000 8536139709
> > > > > 1001 1111222233
> > > > > ;
> > > > > run;
>
> > > > > /* data must be sorted by case */
> > > > > proc transpose data=test out=test1(drop=_name_) PREFIX=claim;
> > > > > by case;
> > > > > var claim;
> > > > > run;
>
> > > > > HTH
> > > > > Patrick
>
> > > > Thanks a lot everyone, proc transpose worked for me.- Hide quoted text -
>
> > > > - Show quoted text -
>
> > > Is there a way to do the same with "retain" statement in datastep? I
> > > have over 1 million records and proc transpose is not efficient as it
> > > is taking up all the available memory and not doing the job
> > > completely. I appreciate any reponses. Thank you.- Hide quoted text -
>
> > > - Show quoted text -
>
> > Hi Sri,
> > To be honest, I had to do a similar thing to you but with 19 million
> > obs! And proc transpose was the only solution we could come up with. I
> > had to get a new computer etc with more memory and dual core chip etc
> > just to run it. It was not a perfect solution (as you've found out).
> > In the end, you have to use a "cribbing" type method, i.e. extract
> > parts that you need bit by bit (maybe write a macro for it, since it
> > will repeat) and then merge your extracts back together to get what
> > you want. This is what I ended up doing.
>
> > Bye for now
> > dudley- Hide quoted text -
>
> > - Show quoted text -
>
> Looking at your other post I think your problem was using the ID
> statement.  Surely you don't want a separate variable for each CLAIM
> number.  Aren't the claim numbers unique.  The OP indicated he wanted
> a simple enumerated variable list, using ID does not provide that.
>
> Here is a simulation using 1M IDs with up to 30 claims.  You can see
> that while PROC TRANPOSE is slower than the data step it has a
> distinct advantage.  You don't need to know the maximum number of
> claims per ID.
>
> 3721   data test;
> 3722      do case = 1 to 1000000;
> 3723         do _n_ = 1 to ceil(ranuni(12345)*30);
> 3724            length claim $10;
> 3725            claim = put(ranuni(0),best10.);
> 3726            output;
> 3727            end;
> 3728         end;
> 3729      run;
>
> NOTE: The data set WORK.TEST has 15497660 observations and 2
> variables.
> NOTE: DATA statement used (Total process time):
>       real time           21.54 seconds
>       user cpu time       16.74 seconds
>       system cpu time     0.84 seconds
>       Memory                            152k
>
> 3730
> 3731   data test2(drop=tclaim);
> 3732      do _n_ = 1 by 1 until(last.case);
> 3733         set test(rename=(claim=tclaim));
> 3734         by case;
> 3735         array claim[30] $10;
> 3736         claim[_n_]=tclaim ;
> 3737         end;
> 3738     run;
>
> NOTE: There were 15497660 observations read from the data set
> WORK.TEST.
> NOTE: The data set WORK.TEST2 has 1000000 observations and 31
> variables.
> NOTE: DATA statement used (Total process time):
>       real time           17.70 seconds
>       user cpu time       4.83 seconds
>       system cpu time     0.92 seconds
>       Memory                            222k
>
> 3739
> 3740
> 3741   proc transpose data=test out=test3 prefix=claim;
> 3742      by case;
> 3743      var claim;
> 3744      run;
>
> NOTE: There were 15497660 observations read from the data set
> WORK.TEST.
> NOTE: The data set WORK.TEST3 has 1000000 observations and 32
> variables.
> NOTE: PROCEDURE TRANSPOSE used (Total process time):
>       real time           22.05 seconds
>       user cpu time       7.97 seconds
>       system cpu time     1.20 seconds
>       Memory                            2155k

Many thanks to all who responded.
 | 
Pages: 1
Prev: Clear All
Next: MACROS IN SAS.