From: NordlDJ on
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of
> ChrisG
> Sent: Monday, February 22, 2010 12:42 PM
> To: SAS-L(a)LISTSERV.UGA.EDU
> Subject: Re: tricky sum
>
> On 22 f�v, 20:47, Nord...(a)DSHS.WA.GOV ("Nordlund, Dan (DSHS/RDA)")
> wrote:
> > > -----Original Message-----
> > > From: SAS(r) Discussion [mailto:SA...(a)LISTSERV.UGA.EDU] On Behalf Of
> > > ChrisG
> > > Sent: Monday, February 22, 2010 8:07 AM
> > > To: SA...(a)LISTSERV.UGA.EDU
> > > Subject: tricky sum
> >
> > > Hi everyone
> >
> > > Here is my tricky sum problem
> > > I have such a dataset
> >
> > > id_bb d0 d1 d2 d3
> > > 1 10 0 0 0
> > > 1 8 5 0 0
> > > 1 6 10 6 0
> > > 1 4 7 4 3
> > > 2 8 0 0 0
> > > 2 4 9 0 0
> > > 2 7 8 5 0
> > > 2 10 3 10 6
> >
> > > id_bb = identifier, then some numeric variables d0-d3
> > > in reality i have more than 50000 lines, more than 3000 unique
> > > identifiers and it goes from d0 until d125 !
> >
> > > now my dream is to have this:
> >
> > > id_bb d0 d1 d2 d3 x
> > > 1 10 0 0 0 28
> > > 1 8 5 0 0 22
> > > 1 6 10 6 0 10
> > > 1 4 7 4 3 3
> > > 2 8 0 0 0 29
> > > 2 4 9 0 0 20
> > > 2 7 8 5 0 15
> > > 2 10 3 10 6 6
> >
> > > this x variable is the trick sum i need
> > > and it goes like this:
> > > for the first id_bb:
> > > x in the first line = sum(d0), i.e. 10+8+6+4
> > > x in the second line = sum(d1), i.e. 0+5+10+7
> > > so on until the last id_bb=1, i.e. x = sum(d3) = 0+0+0+3
> > > then again for the second id_bb:
> > > x in the first line = sum(d0), ie. x = 8+4+7+10
> > > and so on until the last variable, ie. d3
> >
> > > and at the end, in reality, this should happen for every id_bb (more
> > > than 3000 unique id_bb) and for the d0 until d125...
> >
> > > i have no clue how to do it in SAS...
> > > in Excel this will surely be a pain as well and i don't know how to
> > > use VBA...
> >
> > > So if someone could help me out with this, i would be very greatful !
> > > thank you in advance
> > > Best
> > > CG
> >
> > Chris,
> >
> > I am not sure if I understand your wanted solution. You say in your real data you
> have variables d0-d125 that you want to sum up and then append to consecutive
> observations by id_bb. That will only work if you exactly 126 observations for each
> ID. If that is what you have then the following code will work. Change the indices of
> 3 to 125 for your data.
> >
> > data have;
> > infile cards;
> > input id_bb d0 d1 d2 d3;
> > cards;
> > 1 10 0 0 0
> > 1 8 5 0 0
> > 1 6 10 6 0
> > 1 4 7 4 3
> > 2 8 0 0 0
> > 2 4 9 0 0
> > 2 7 8 5 0
> > 2 10 3 10 6
> > ;
> > run;
> > data want;
> > do until(last.id_bb);
> > set have;
> > by id_bb;
> > array d_a[0:3] d0-d3;
> > array sum_a[0:3] s0-s3;
> >
> > do i=0 to 3;
> > sum_a[i] = sum(0,sum_a[i],d_a[i]);
> > end;
> > end;
> > do i=0 to 3;
> > set have;
> > x = sum_a[i];
> > output;
> > end;
> > run;
> > proc print;
> > run;
> >
> > I you don't have 126 observations per ID then you will need to write back to SAS-
> L and give us more detail on exactly what you want. Hope this is helpful,
> >
> > Dan
> >
> > Daniel J. Nordlund
> > Washington State Department of Social and Health Services
> > Planning, Performance, and Accountability
> > Research and Data Analysis Division
> > Olympia, WA 98504-5204
>
> Hello everybody
>
> thanks a lot for your help !
> unfortunately, the real data may have less than 126 obs. per id...
> but the code by Mike Zdeb works perfectly, so i found what i was
> looking for
> thanks to all of you
> and thank you to Mike for his code
>
> Best
> CG


Chris,

Maybe I am missing something, but if you don't have the same number of observations as you have variables (d0 to d_whatever), then I don't see how Mikes solution works. Just add one more variable to your list and see what happens.

data x;
input id_bb d0-d4;
datalines;
1 10 0 0 0 0
1 8 5 0 0 0
1 6 10 6 0 0
1 4 7 4 3 1
2 8 0 0 0 0
2 4 9 0 0 0
2 7 8 5 0 0
2 10 3 10 6 2
;
run;

proc summary data=x nway;
var d0-d4;
class id_bb;
output out=stats (drop=_:) sum=;
run;

proc transpose data=stats out=tstats (keep=col1 rename=(col1=x));
by id_bb;
run;

options mergenoby=warn;

data x;
merge x tstats;
run;

options mergenoby=error;

proc print data=x;
run;

You can't possibly want that final file, can you?

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204
From: ChrisG on
On 22 fév, 23:11, Nord...(a)DSHS.WA.GOV ("Nordlund, Dan (DSHS/RDA)")
wrote:
> > -----Original Message-----
> > From: SAS(r) Discussion [mailto:SA...(a)LISTSERV.UGA.EDU] On Behalf Of
> > ChrisG
> > Sent: Monday, February 22, 2010 12:42 PM
> > To: SA...(a)LISTSERV.UGA.EDU
> > Subject: Re: tricky sum
>
> > On 22 fév, 20:47, Nord...(a)DSHS.WA.GOV ("Nordlund, Dan (DSHS/RDA)")
> > wrote:
> > > > -----Original Message-----
> > > > From: SAS(r) Discussion [mailto:SA...(a)LISTSERV.UGA.EDU] On Behalf Of
> > > > ChrisG
> > > > Sent: Monday, February 22, 2010 8:07 AM
> > > > To: SA...(a)LISTSERV.UGA.EDU
> > > > Subject: tricky sum
>
> > > > Hi everyone
>
> > > > Here is my tricky sum problem
> > > > I have such a dataset
>
> > > > id_bb   d0      d1      d2      d3
> > > > 1       10      0       0       0
> > > > 1       8       5       0       0
> > > > 1       6       10      6       0
> > > > 1       4       7       4       3
> > > > 2       8       0       0       0
> > > > 2       4       9       0       0
> > > > 2       7       8       5       0
> > > > 2       10      3       10      6
>
> > > > id_bb = identifier, then some numeric variables d0-d3
> > > > in reality i have more than 50000 lines, more than 3000 unique
> > > > identifiers and it goes from d0 until d125 !
>
> > > > now my dream is to have this:
>
> > > > id_bb   d0      d1      d2      d3      x
> > > > 1       10      0       0       0       28
> > > > 1       8       5       0       0       22
> > > > 1       6       10      6       0       10
> > > > 1       4       7       4       3       3
> > > > 2       8       0       0       0       29
> > > > 2       4       9       0       0       20
> > > > 2       7       8       5       0       15
> > > > 2       10      3       10      6       6
>
> > > > this x variable is the trick sum i need
> > > > and it goes like this:
> > > > for the first id_bb:
> > > > x in the first line = sum(d0), i.e. 10+8+6+4
> > > > x in the second line = sum(d1), i.e. 0+5+10+7
> > > > so on until the last id_bb=1, i.e. x = sum(d3) = 0+0+0+3
> > > > then again for the second id_bb:
> > > > x in the first line = sum(d0), ie. x = 8+4+7+10
> > > > and so on until the last variable, ie. d3
>
> > > > and at the end, in reality, this should happen for every id_bb (more
> > > > than 3000 unique id_bb) and for the d0 until d125...
>
> > > > i have no clue how to do it in SAS...
> > > > in Excel this will surely be a pain as well and i don't know how to
> > > > use VBA...
>
> > > > So if someone could help me out with this, i would be very greatful !
> > > > thank you in advance
> > > > Best
> > > > CG
>
> > > Chris,
>
> > > I am not sure if I understand your wanted solution.  You say in your real data you
> > have variables d0-d125 that you want to sum up and then append to consecutive
> > observations by id_bb.  That will only work if you exactly 126 observations for each
> > ID.  If that is what you have then the following code will work.  Change the indices of
> > 3 to 125 for your data.
>
> > > data have;
> > > infile cards;
> > > input id_bb   d0      d1      d2      d3;
> > > cards;
> > > 1       10      0       0       0
> > > 1       8       5       0       0
> > > 1       6       10      6       0
> > > 1       4       7       4       3
> > > 2       8       0       0       0
> > > 2       4       9       0       0
> > > 2       7       8       5       0
> > > 2       10      3       10      6
> > > ;
> > > run;
> > > data want;
> > >   do until(last.id_bb);
> > >     set have;
> > >     by id_bb;
> > >     array d_a[0:3] d0-d3;
> > >     array sum_a[0:3] s0-s3;
>
> > >     do i=0 to 3;
> > >       sum_a[i] = sum(0,sum_a[i],d_a[i]);
> > >     end;
> > >   end;
> > >   do i=0 to 3;
> > >     set have;
> > >         x = sum_a[i];
> > >         output;
> > >   end;
> > > run;
> > > proc print;
> > > run;
>
> > > I you don't have 126 observations per ID then you will need to write back to SAS-
> > L and give us more detail on exactly what you want.  Hope this is helpful,
>
> > > Dan
>
> > > Daniel J. Nordlund
> > > Washington State Department of Social and Health Services
> > > Planning, Performance, and Accountability
> > > Research and Data Analysis Division
> > > Olympia, WA  98504-5204
>
> > Hello everybody
>
> > thanks a lot for your help !
> > unfortunately, the real data may have less than 126 obs. per id...
> > but the code by Mike Zdeb works perfectly, so i found what i was
> > looking for
> > thanks to all of you
> > and thank you to Mike for his code
>
> > Best
> > CG
>
> Chris,
>
> Maybe I am missing something, but if you don't have the same number of observations as you have variables (d0 to d_whatever), then I don't see how Mikes solution works. Just add one more variable to your list and see what happens.
>
> data x;
> input id_bb  d0-d4;
> datalines;
> 1       10      0       0       0     0
> 1       8       5       0       0     0
> 1       6       10      6       0     0
> 1       4       7       4       3     1
> 2       8       0       0       0     0
> 2       4       9       0       0     0
> 2       7       8       5       0     0
> 2       10      3       10      6     2
> ;
> run;
>
> proc summary data=x nway;
> var d0-d4;
> class id_bb;
> output out=stats (drop=_:) sum=;
> run;
>
> proc transpose data=stats out=tstats (keep=col1 rename=(col1=x));
> by id_bb;
> run;
>
> options mergenoby=warn;
>
> data x;
> merge x tstats;
> run;
>
> options mergenoby=error;
>
> proc print data=x;
> run;
>
> You can't possibly want that final file, can you?
>
> Dan
>
> Daniel J. Nordlund
> Washington State Department of Social and Health Services
> Planning, Performance, and Accountability
> Research and Data Analysis Division
> Olympia, WA  98504-5204

Hi Daniel

ok i will check again the final output on the full sample
it is true that your example gives a "bad" solution for me
BRB

Best
CG
From: ChrisG on
On 22 fév, 23:11, Nord...(a)DSHS.WA.GOV ("Nordlund, Dan (DSHS/RDA)")
wrote:
> > -----Original Message-----
> > From: SAS(r) Discussion [mailto:SA...(a)LISTSERV.UGA.EDU] On Behalf Of
> > ChrisG
> > Sent: Monday, February 22, 2010 12:42 PM
> > To: SA...(a)LISTSERV.UGA.EDU
> > Subject: Re: tricky sum
>
> > On 22 fév, 20:47, Nord...(a)DSHS.WA.GOV ("Nordlund, Dan (DSHS/RDA)")
> > wrote:
> > > > -----Original Message-----
> > > > From: SAS(r) Discussion [mailto:SA...(a)LISTSERV.UGA.EDU] On Behalf Of
> > > > ChrisG
> > > > Sent: Monday, February 22, 2010 8:07 AM
> > > > To: SA...(a)LISTSERV.UGA.EDU
> > > > Subject: tricky sum
>
> > > > Hi everyone
>
> > > > Here is my tricky sum problem
> > > > I have such a dataset
>
> > > > id_bb   d0      d1      d2      d3
> > > > 1       10      0       0       0
> > > > 1       8       5       0       0
> > > > 1       6       10      6       0
> > > > 1       4       7       4       3
> > > > 2       8       0       0       0
> > > > 2       4       9       0       0
> > > > 2       7       8       5       0
> > > > 2       10      3       10      6
>
> > > > id_bb = identifier, then some numeric variables d0-d3
> > > > in reality i have more than 50000 lines, more than 3000 unique
> > > > identifiers and it goes from d0 until d125 !
>
> > > > now my dream is to have this:
>
> > > > id_bb   d0      d1      d2      d3      x
> > > > 1       10      0       0       0       28
> > > > 1       8       5       0       0       22
> > > > 1       6       10      6       0       10
> > > > 1       4       7       4       3       3
> > > > 2       8       0       0       0       29
> > > > 2       4       9       0       0       20
> > > > 2       7       8       5       0       15
> > > > 2       10      3       10      6       6
>
> > > > this x variable is the trick sum i need
> > > > and it goes like this:
> > > > for the first id_bb:
> > > > x in the first line = sum(d0), i.e. 10+8+6+4
> > > > x in the second line = sum(d1), i.e. 0+5+10+7
> > > > so on until the last id_bb=1, i.e. x = sum(d3) = 0+0+0+3
> > > > then again for the second id_bb:
> > > > x in the first line = sum(d0), ie. x = 8+4+7+10
> > > > and so on until the last variable, ie. d3
>
> > > > and at the end, in reality, this should happen for every id_bb (more
> > > > than 3000 unique id_bb) and for the d0 until d125...
>
> > > > i have no clue how to do it in SAS...
> > > > in Excel this will surely be a pain as well and i don't know how to
> > > > use VBA...
>
> > > > So if someone could help me out with this, i would be very greatful !
> > > > thank you in advance
> > > > Best
> > > > CG
>
> > > Chris,
>
> > > I am not sure if I understand your wanted solution.  You say in your real data you
> > have variables d0-d125 that you want to sum up and then append to consecutive
> > observations by id_bb.  That will only work if you exactly 126 observations for each
> > ID.  If that is what you have then the following code will work.  Change the indices of
> > 3 to 125 for your data.
>
> > > data have;
> > > infile cards;
> > > input id_bb   d0      d1      d2      d3;
> > > cards;
> > > 1       10      0       0       0
> > > 1       8       5       0       0
> > > 1       6       10      6       0
> > > 1       4       7       4       3
> > > 2       8       0       0       0
> > > 2       4       9       0       0
> > > 2       7       8       5       0
> > > 2       10      3       10      6
> > > ;
> > > run;
> > > data want;
> > >   do until(last.id_bb);
> > >     set have;
> > >     by id_bb;
> > >     array d_a[0:3] d0-d3;
> > >     array sum_a[0:3] s0-s3;
>
> > >     do i=0 to 3;
> > >       sum_a[i] = sum(0,sum_a[i],d_a[i]);
> > >     end;
> > >   end;
> > >   do i=0 to 3;
> > >     set have;
> > >         x = sum_a[i];
> > >         output;
> > >   end;
> > > run;
> > > proc print;
> > > run;
>
> > > I you don't have 126 observations per ID then you will need to write back to SAS-
> > L and give us more detail on exactly what you want.  Hope this is helpful,
>
> > > Dan
>
> > > Daniel J. Nordlund
> > > Washington State Department of Social and Health Services
> > > Planning, Performance, and Accountability
> > > Research and Data Analysis Division
> > > Olympia, WA  98504-5204
>
> > Hello everybody
>
> > thanks a lot for your help !
> > unfortunately, the real data may have less than 126 obs. per id...
> > but the code by Mike Zdeb works perfectly, so i found what i was
> > looking for
> > thanks to all of you
> > and thank you to Mike for his code
>
> > Best
> > CG
>
> Chris,
>
> Maybe I am missing something, but if you don't have the same number of observations as you have variables (d0 to d_whatever), then I don't see how Mikes solution works. Just add one more variable to your list and see what happens.
>
> data x;
> input id_bb  d0-d4;
> datalines;
> 1       10      0       0       0     0
> 1       8       5       0       0     0
> 1       6       10      6       0     0
> 1       4       7       4       3     1
> 2       8       0       0       0     0
> 2       4       9       0       0     0
> 2       7       8       5       0     0
> 2       10      3       10      6     2
> ;
> run;
>
> proc summary data=x nway;
> var d0-d4;
> class id_bb;
> output out=stats (drop=_:) sum=;
> run;
>
> proc transpose data=stats out=tstats (keep=col1 rename=(col1=x));
> by id_bb;
> run;
>
> options mergenoby=warn;
>
> data x;
> merge x tstats;
> run;
>
> options mergenoby=error;
>
> proc print data=x;
> run;
>
> You can't possibly want that final file, can you?
>
> Dan
>
> Daniel J. Nordlund
> Washington State Department of Social and Health Services
> Planning, Performance, and Accountability
> Research and Data Analysis Division
> Olympia, WA  98504-5204

Effectively Daniel you are right
i got excited to fast by looking at the (first lines) of final dataset
now i have checked the intermediate steps and something weird is
happening into stats and tstats as well...
so my problem is still not fixed !
gosh...
does anyone has a solution ?

thank you Daniel for spoting the problem !

Best
CG
From: ChrisG on
On 23 fév, 09:32, ChrisG <chris.godlew...(a)gmail.com> wrote:
> On 22 fév, 23:11, Nord...(a)DSHS.WA.GOV ("Nordlund, Dan (DSHS/RDA)")
> wrote:
>
>
>
> > > -----Original Message-----
> > > From: SAS(r) Discussion [mailto:SA...(a)LISTSERV.UGA.EDU] On Behalf Of
> > > ChrisG
> > > Sent: Monday, February 22, 2010 12:42 PM
> > > To: SA...(a)LISTSERV.UGA.EDU
> > > Subject: Re: tricky sum
>
> > > On 22 fév, 20:47, Nord...(a)DSHS.WA.GOV ("Nordlund, Dan (DSHS/RDA)")
> > > wrote:
> > > > > -----Original Message-----
> > > > > From: SAS(r) Discussion [mailto:SA...(a)LISTSERV.UGA.EDU] On Behalf Of
> > > > > ChrisG
> > > > > Sent: Monday, February 22, 2010 8:07 AM
> > > > > To: SA...(a)LISTSERV.UGA.EDU
> > > > > Subject: tricky sum
>
> > > > > Hi everyone
>
> > > > > Here is my tricky sum problem
> > > > > I have such a dataset
>
> > > > > id_bb   d0      d1      d2      d3
> > > > > 1       10      0       0       0
> > > > > 1       8       5       0       0
> > > > > 1       6       10      6       0
> > > > > 1       4       7       4       3
> > > > > 2       8       0       0       0
> > > > > 2       4       9       0       0
> > > > > 2       7       8       5       0
> > > > > 2       10      3       10      6
>
> > > > > id_bb = identifier, then some numeric variables d0-d3
> > > > > in reality i have more than 50000 lines, more than 3000 unique
> > > > > identifiers and it goes from d0 until d125 !
>
> > > > > now my dream is to have this:
>
> > > > > id_bb   d0      d1      d2      d3      x
> > > > > 1       10      0       0       0       28
> > > > > 1       8       5       0       0       22
> > > > > 1       6       10      6       0       10
> > > > > 1       4       7       4       3       3
> > > > > 2       8       0       0       0       29
> > > > > 2       4       9       0       0       20
> > > > > 2       7       8       5       0       15
> > > > > 2       10      3       10      6       6
>
> > > > > this x variable is the trick sum i need
> > > > > and it goes like this:
> > > > > for the first id_bb:
> > > > > x in the first line = sum(d0), i.e. 10+8+6+4
> > > > > x in the second line = sum(d1), i.e. 0+5+10+7
> > > > > so on until the last id_bb=1, i.e. x = sum(d3) = 0+0+0+3
> > > > > then again for the second id_bb:
> > > > > x in the first line = sum(d0), ie. x = 8+4+7+10
> > > > > and so on until the last variable, ie. d3
>
> > > > > and at the end, in reality, this should happen for every id_bb (more
> > > > > than 3000 unique id_bb) and for the d0 until d125...
>
> > > > > i have no clue how to do it in SAS...
> > > > > in Excel this will surely be a pain as well and i don't know how to
> > > > > use VBA...
>
> > > > > So if someone could help me out with this, i would be very greatful !
> > > > > thank you in advance
> > > > > Best
> > > > > CG
>
> > > > Chris,
>
> > > > I am not sure if I understand your wanted solution.  You say in your real data you
> > > have variables d0-d125 that you want to sum up and then append to consecutive
> > > observations by id_bb.  That will only work if you exactly 126 observations for each
> > > ID.  If that is what you have then the following code will work.  Change the indices of
> > > 3 to 125 for your data.
>
> > > > data have;
> > > > infile cards;
> > > > input id_bb   d0      d1      d2      d3;
> > > > cards;
> > > > 1       10      0       0       0
> > > > 1       8       5       0       0
> > > > 1       6       10      6       0
> > > > 1       4       7       4       3
> > > > 2       8       0       0       0
> > > > 2       4       9       0       0
> > > > 2       7       8       5       0
> > > > 2       10      3       10      6
> > > > ;
> > > > run;
> > > > data want;
> > > >   do until(last.id_bb);
> > > >     set have;
> > > >     by id_bb;
> > > >     array d_a[0:3] d0-d3;
> > > >     array sum_a[0:3] s0-s3;
>
> > > >     do i=0 to 3;
> > > >       sum_a[i] = sum(0,sum_a[i],d_a[i]);
> > > >     end;
> > > >   end;
> > > >   do i=0 to 3;
> > > >     set have;
> > > >         x = sum_a[i];
> > > >         output;
> > > >   end;
> > > > run;
> > > > proc print;
> > > > run;
>
> > > > I you don't have 126 observations per ID then you will need to write back to SAS-
> > > L and give us more detail on exactly what you want.  Hope this is helpful,
>
> > > > Dan
>
> > > > Daniel J. Nordlund
> > > > Washington State Department of Social and Health Services
> > > > Planning, Performance, and Accountability
> > > > Research and Data Analysis Division
> > > > Olympia, WA  98504-5204
>
> > > Hello everybody
>
> > > thanks a lot for your help !
> > > unfortunately, the real data may have less than 126 obs. per id...
> > > but the code by Mike Zdeb works perfectly, so i found what i was
> > > looking for
> > > thanks to all of you
> > > and thank you to Mike for his code
>
> > > Best
> > > CG
>
> > Chris,
>
> > Maybe I am missing something, but if you don't have the same number of observations as you have variables (d0 to d_whatever), then I don't see how Mikes solution works. Just add one more variable to your list and see what happens.
>
> > data x;
> > input id_bb  d0-d4;
> > datalines;
> > 1       10      0       0       0     0
> > 1       8       5       0       0     0
> > 1       6       10      6       0     0
> > 1       4       7       4       3     1
> > 2       8       0       0       0     0
> > 2       4       9       0       0     0
> > 2       7       8       5       0     0
> > 2       10      3       10      6     2
> > ;
> > run;
>
> > proc summary data=x nway;
> > var d0-d4;
> > class id_bb;
> > output out=stats (drop=_:) sum=;
> > run;
>
> > proc transpose data=stats out=tstats (keep=col1 rename=(col1=x));
> > by id_bb;
> > run;
>
> > options mergenoby=warn;
>
> > data x;
> > merge x tstats;
> > run;
>
> > options mergenoby=error;
>
> > proc print data=x;
> > run;
>
> > You can't possibly want that final file, can you?
>
> > Dan
>
> > Daniel J. Nordlund
> > Washington State Department of Social and Health Services
> > Planning, Performance, and Accountability
> > Research and Data Analysis Division
> > Olympia, WA  98504-5204
>
> Effectively Daniel you are right
> i got excited to fast by looking at the (first lines) of final dataset
> now i have checked the intermediate steps and something weird is
> happening into stats and tstats as well...
> so my problem is still not fixed !
> gosh...
> does anyone has a solution ?
>
> thank you Daniel for spoting the problem !
>
> Best
> CG

Daniel
I did something like this on the example :


proc transpose data=stats out=tstats ;
by id_bb;
run;

data tstats; set tstats;
d=_name_;
x=col1;
drop _name_ col1;
run;

data x;
merge x tstats; by id_bb;
run;

and the final sample looks fine...
what do you think ?

best
CG
From: Jack Clark on
Chris,



I understand that your data may have fewer than 126 observations per ID.
From your description below, it sounds like the number of observations
for a particular ID will be the number of numeric variables (D0-D125)
which need to be summed. Is this correct?



Will the numeric variables always be populated starting with D0 and
going up toward D125?



For example, when an ID has 8 observations does this mean that D0 - D07
will always be the variables which need to be summed? Or could the
variables needing to be summed be spread out across the D0 - D125 range?



If the range of variables to be summed always starts with D0, D1, D2,
etc. and contains a number of numeric variables to be summed equivalent
to the number of observations for that ID, a slight modification to my
solution from yesterday may work. The main logic modification is in the
final merge, where a subsetting IF statement is added to only keep
observations from the original data. I have added an additional ID
(which does not contain 4 observations) to the sample data for testing.









*(using your sample data);

data have;

infile cards missover;

input id_bb d0 d1 d2 d3;

cards;

1 10 0 0 0

1 8 5 0 0

1 6 10 6 0

1 4 7 4 3

2 8 0 0 0

2 4 9 0 0

2 7 8 5 0

2 10 3 10 6

3 4 1 0 0

3 2 8 0 0

;

run;



proc print data = have;

run;





* create SUM_VAR containing variable name of summed variable ;

data next (drop=arr_cnt);

set have;

by id_bb;

array dvars(*) d0-d3;

if first.id_bb then arr_cnt = 1;

sum_var = vname(dvars(arr_cnt));

arr_cnt + 1;

run;



proc print data = next;

run;



* sum numeric fields by ID_BB ;

proc summary data = have nway missing;

class id_bb;

var d0-d3;

output out = have_sum (drop=_type_ _freq_) sum=; run;



proc print data = have_sum;

run;



* transpose summed data - rename fields ;

proc transpose data = have_sum out = have_sumt (rename=(_name_=sum_var
col1=x));

by id_bb;

run;



proc print data = have_sumt;

run;





* merge summed data to original data ;

data need;

merge next (in=a) have_sumt;

by id_bb sum_var;

if a;

run;



proc print data = need;

run;







Jack









Jack Clark
Senior Research Analyst
phone: 410-455-6256
fax: 410-455-6850
jclark(a)hilltop.umbc.edu ________________________________

From: Christophe Godlewski [mailto:chris.godlewski(a)gmail.com]
Sent: Monday, February 22, 2010 2:46 PM
To: Jack Clark
Cc: SAS-L(a)listserv.uga.edu
Subject: Re: tricky sum



Hello Jack

Thank you very much for your help
it works fine with the data for the example
unfortunately, in the full dataset, each id_bb can have less than 126
observations...
it can be 1, 3, 89 or 126...
126 is the maximum actually
more precisely, the "matrix" has 126 terms for each id_bb
but for instance if an id_bb has only 89 observations, then for d89 to
d125, there will be 0 in the cells...
now i am thinking of a trick to make your code applicable to my data...
but if you have a smart idea im taking it !

thank you again for your help
best
CG

On Mon, Feb 22, 2010 at 8:18 PM, Jack Clark <jclark(a)hilltop.umbc.edu>
wrote:

Chris,

If I understand the question correctly, this means that each ID_BB value
in your data has exactly 126 observations? This is because you will
need 126 observations per ID to hold the sums of D0-D125. Is that
correct?

This solution includes the creation of a new variable called SUM_VAR
which has the value of the variable name being summed as the value of X.
You can drop it at the end if you don't want it. The data are
summarized by ID_BB to get the sum of each numeric variable. That
summarized data is transposed and then merged to the original data by
ID_BB and SUM_VAR.

Let me know if this meets your needs.


*(using your sample data);
data have;
infile cards missover;
input id_bb d0 d1 d2 d3;
cards;

1 10 0 0 0
1 8 5 0 0
1 6 10 6 0
1 4 7 4 3
2 8 0 0 0
2 4 9 0 0
2 7 8 5 0
2 10 3 10 6

;
run;


* create SUM_VAR containing variable name of summed variable ;
data next (drop=arr_cnt);
set have;
by id_bb;
array dvars(*) d0-d3;
if first.id_bb then arr_cnt = 1;
sum_var = vname(dvars(arr_cnt));
arr_cnt + 1;
run;

proc print data = next;
run;

* sum numeric fields by ID_BB ;
proc summary data = have nway missing;
class id_bb;
var d0-d3;
output out = have_sum (drop=_type_ _freq_) sum=;
run;

proc print data = have_sum;
run;

* transpose summed data - rename fields ;
proc transpose data = have_sum out = have_sumt (rename=(_name_=sum_var
col1=x));
by id_bb;
run;

proc print data = have_sumt;
run;


* merge summed data to original data ;
data need;
merge next have_sumt;
by id_bb sum_var;
run;

proc print data = need;
run;




Jack






Jack Clark
Senior Research Analyst
phone: 410-455-6256
fax: 410-455-6850
jclark(a)hilltop.umbc.edu

University of Maryland, Baltimore County
Sondheim Hall, 3rd Floor
1000 Hilltop Circle
Baltimore, MD 21250




Confidentiality Notice: This e-mail may contain information that is
legally privileged and that is intended only for the use of the
addressee(s) named above. If you are not the intended recipient, you are
hereby notified that any disclosure, copying of this e-mail,
distribution, or action taken in reliance on the contents of this e-mail
and/or documents attributed to this e-mail is strictly prohibited. If
you have received this information in error, please notify the sender
immediately by phone and delete this entire e-mail. Thank
you.-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of
ChrisG
Sent: Monday, February 22, 2010 11:07 AM
To: SAS-L(a)LISTSERV.UGA.EDU
Subject: tricky sum


Hi everyone

Here is my tricky sum problem
I have such a dataset

id_bb d0 d1 d2 d3
1 10 0 0 0
1 8 5 0 0
1 6 10 6 0
1 4 7 4 3
2 8 0 0 0
2 4 9 0 0
2 7 8 5 0
2 10 3 10 6

id_bb = identifier, then some numeric variables d0-d3
in reality i have more than 50000 lines, more than 3000 unique
identifiers and it goes from d0 until d125 !

now my dream is to have this:

id_bb d0 d1 d2 d3 x
1 10 0 0 0 28
1 8 5 0 0 22
1 6 10 6 0 10
1 4 7 4 3 3
2 8 0 0 0 29
2 4 9 0 0 20
2 7 8 5 0 15
2 10 3 10 6 6

this x variable is the trick sum i need
and it goes like this:
for the first id_bb:
x in the first line = sum(d0), i.e. 10+8+6+4
x in the second line = sum(d1), i.e. 0+5+10+7
so on until the last id_bb=1, i.e. x = sum(d3) = 0+0+0+3
then again for the second id_bb:
x in the first line = sum(d0), ie. x = 8+4+7+10
and so on until the last variable, ie. d3

and at the end, in reality, this should happen for every id_bb (more
than 3000 unique id_bb) and for the d0 until d125...

i have no clue how to do it in SAS...
in Excel this will surely be a pain as well and i don't know how to
use VBA...

So if someone could help me out with this, i would be very greatful !
thank you in advance
Best
CG




--
Christophe J. GODLEWSKI
Strasbourg, France