From: ChrisG on
Hi everyone

Here is my tricky sum problem
I have such a dataset

id_bb d0 d1 d2 d3
1 10 0 0 0
1 8 5 0 0
1 6 10 6 0
1 4 7 4 3
2 8 0 0 0
2 4 9 0 0
2 7 8 5 0
2 10 3 10 6

id_bb = identifier, then some numeric variables d0-d3
in reality i have more than 50000 lines, more than 3000 unique
identifiers and it goes from d0 until d125 !

now my dream is to have this:

id_bb d0 d1 d2 d3 x
1 10 0 0 0 28
1 8 5 0 0 22
1 6 10 6 0 10
1 4 7 4 3 3
2 8 0 0 0 29
2 4 9 0 0 20
2 7 8 5 0 15
2 10 3 10 6 6

this x variable is the trick sum i need
and it goes like this:
for the first id_bb:
x in the first line = sum(d0), i.e. 10+8+6+4
x in the second line = sum(d1), i.e. 0+5+10+7
so on until the last id_bb=1, i.e. x = sum(d3) = 0+0+0+3
then again for the second id_bb:
x in the first line = sum(d0), ie. x = 8+4+7+10
and so on until the last variable, ie. d3

and at the end, in reality, this should happen for every id_bb (more
than 3000 unique id_bb) and for the d0 until d125...

i have no clue how to do it in SAS...
in Excel this will surely be a pain as well and i don't know how to
use VBA...

So if someone could help me out with this, i would be very greatful !
thank you in advance
Best
CG
From: Jack Clark on
Chris,

If I understand the question correctly, this means that each ID_BB value
in your data has exactly 126 observations? This is because you will
need 126 observations per ID to hold the sums of D0-D125. Is that
correct?

This solution includes the creation of a new variable called SUM_VAR
which has the value of the variable name being summed as the value of X.
You can drop it at the end if you don't want it. The data are
summarized by ID_BB to get the sum of each numeric variable. That
summarized data is transposed and then merged to the original data by
ID_BB and SUM_VAR.

Let me know if this meets your needs.


*(using your sample data);
data have;
infile cards missover;
input id_bb d0 d1 d2 d3;
cards;
1 10 0 0 0
1 8 5 0 0
1 6 10 6 0
1 4 7 4 3
2 8 0 0 0
2 4 9 0 0
2 7 8 5 0
2 10 3 10 6
;
run;


* create SUM_VAR containing variable name of summed variable ;
data next (drop=arr_cnt);
set have;
by id_bb;
array dvars(*) d0-d3;
if first.id_bb then arr_cnt = 1;
sum_var = vname(dvars(arr_cnt));
arr_cnt + 1;
run;

proc print data = next;
run;

* sum numeric fields by ID_BB ;
proc summary data = have nway missing;
class id_bb;
var d0-d3;
output out = have_sum (drop=_type_ _freq_) sum=;
run;

proc print data = have_sum;
run;

* transpose summed data - rename fields ;
proc transpose data = have_sum out = have_sumt (rename=(_name_=sum_var
col1=x));
by id_bb;
run;

proc print data = have_sumt;
run;


* merge summed data to original data ;
data need;
merge next have_sumt;
by id_bb sum_var;
run;

proc print data = need;
run;




Jack






Jack Clark
Senior Research Analyst
phone: 410-455-6256
fax: 410-455-6850
jclark(a)hilltop.umbc.edu

University of Maryland, Baltimore County
Sondheim Hall, 3rd Floor
1000 Hilltop Circle
Baltimore, MD 21250




Confidentiality Notice: This e-mail may contain information that is legally privileged and that is intended only for the use of the addressee(s) named above. If you are not the intended recipient, you are hereby notified that any disclosure, copying of this e-mail, distribution, or action taken in reliance on the contents of this e-mail and/or documents attributed to this e-mail is strictly prohibited. If you have received this information in error, please notify the sender immediately by phone and delete this entire e-mail. Thank you.-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of
ChrisG
Sent: Monday, February 22, 2010 11:07 AM
To: SAS-L(a)LISTSERV.UGA.EDU
Subject: tricky sum

Hi everyone

Here is my tricky sum problem
I have such a dataset

id_bb d0 d1 d2 d3
1 10 0 0 0
1 8 5 0 0
1 6 10 6 0
1 4 7 4 3
2 8 0 0 0
2 4 9 0 0
2 7 8 5 0
2 10 3 10 6

id_bb = identifier, then some numeric variables d0-d3
in reality i have more than 50000 lines, more than 3000 unique
identifiers and it goes from d0 until d125 !

now my dream is to have this:

id_bb d0 d1 d2 d3 x
1 10 0 0 0 28
1 8 5 0 0 22
1 6 10 6 0 10
1 4 7 4 3 3
2 8 0 0 0 29
2 4 9 0 0 20
2 7 8 5 0 15
2 10 3 10 6 6

this x variable is the trick sum i need
and it goes like this:
for the first id_bb:
x in the first line = sum(d0), i.e. 10+8+6+4
x in the second line = sum(d1), i.e. 0+5+10+7
so on until the last id_bb=1, i.e. x = sum(d3) = 0+0+0+3
then again for the second id_bb:
x in the first line = sum(d0), ie. x = 8+4+7+10
and so on until the last variable, ie. d3

and at the end, in reality, this should happen for every id_bb (more
than 3000 unique id_bb) and for the d0 until d125...

i have no clue how to do it in SAS...
in Excel this will surely be a pain as well and i don't know how to
use VBA...

So if someone could help me out with this, i would be very greatful !
thank you in advance
Best
CG
From: NordlDJ on
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of
> ChrisG
> Sent: Monday, February 22, 2010 8:07 AM
> To: SAS-L(a)LISTSERV.UGA.EDU
> Subject: tricky sum
>
> Hi everyone
>
> Here is my tricky sum problem
> I have such a dataset
>
> id_bb d0 d1 d2 d3
> 1 10 0 0 0
> 1 8 5 0 0
> 1 6 10 6 0
> 1 4 7 4 3
> 2 8 0 0 0
> 2 4 9 0 0
> 2 7 8 5 0
> 2 10 3 10 6
>
> id_bb = identifier, then some numeric variables d0-d3
> in reality i have more than 50000 lines, more than 3000 unique
> identifiers and it goes from d0 until d125 !
>
> now my dream is to have this:
>
> id_bb d0 d1 d2 d3 x
> 1 10 0 0 0 28
> 1 8 5 0 0 22
> 1 6 10 6 0 10
> 1 4 7 4 3 3
> 2 8 0 0 0 29
> 2 4 9 0 0 20
> 2 7 8 5 0 15
> 2 10 3 10 6 6
>
> this x variable is the trick sum i need
> and it goes like this:
> for the first id_bb:
> x in the first line = sum(d0), i.e. 10+8+6+4
> x in the second line = sum(d1), i.e. 0+5+10+7
> so on until the last id_bb=1, i.e. x = sum(d3) = 0+0+0+3
> then again for the second id_bb:
> x in the first line = sum(d0), ie. x = 8+4+7+10
> and so on until the last variable, ie. d3
>
> and at the end, in reality, this should happen for every id_bb (more
> than 3000 unique id_bb) and for the d0 until d125...
>
> i have no clue how to do it in SAS...
> in Excel this will surely be a pain as well and i don't know how to
> use VBA...
>
> So if someone could help me out with this, i would be very greatful !
> thank you in advance
> Best
> CG

Chris,

I am not sure if I understand your wanted solution. You say in your real data you have variables d0-d125 that you want to sum up and then append to consecutive observations by id_bb. That will only work if you exactly 126 observations for each ID. If that is what you have then the following code will work. Change the indices of 3 to 125 for your data.

data have;
infile cards;
input id_bb d0 d1 d2 d3;
cards;
1 10 0 0 0
1 8 5 0 0
1 6 10 6 0
1 4 7 4 3
2 8 0 0 0
2 4 9 0 0
2 7 8 5 0
2 10 3 10 6
;
run;
data want;
do until(last.id_bb);
set have;
by id_bb;
array d_a[0:3] d0-d3;
array sum_a[0:3] s0-s3;

do i=0 to 3;
sum_a[i] = sum(0,sum_a[i],d_a[i]);
end;
end;
do i=0 to 3;
set have;
x = sum_a[i];
output;
end;
run;
proc print;
run;

I you don't have 126 observations per ID then you will need to write back to SAS-L and give us more detail on exactly what you want. Hope this is helpful,

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204
From: Mike Zdeb on
hi ... if you can live with a MERGE with no BY variable, this should work
since the data set TSTATS should (will) be in the correct order
after PROC TRANSPOSE

you can scale it up for your data (VAR D0-D125 in PROC SUMMARy) ...

data x;
input id_bb d0-d3;
datalines;
1 10 0 0 0
1 8 5 0 0
1 6 10 6 0
1 4 7 4 3
2 8 0 0 0
2 4 9 0 0
2 7 8 5 0
2 10 3 10 6
;
run;

proc summary data=x nway;
var d0-d3;
class id_bb;
output out=stats (drop=_:) sum=;
run;

proc transpose data=stats out=tstats (keep=col1 rename=(col1=x));
by id_bb;
run;

options mergenoby=warn;

data x;
merge x tstats;
run;

options mergenoby=error;

proc print data=x;
run;

output ...

Obs ID_BB D0 D1 D2 D3 X
1 1 10 0 0 0 28
2 1 8 5 0 0 22
3 1 6 10 6 0 10
4 1 4 7 4 3 3
5 2 8 0 0 0 29
6 2 4 9 0 0 20
7 2 7 8 5 0 15
8 2 10 3 10 6 6


--
Mike Zdeb
U(a)Albany School of Public Health
One University Place
Rensselaer, New York 12144-3456
P/518-402-6479 F/630-604-1475

> Hi everyone
>
> Here is my tricky sum problem
> I have such a dataset
>
> id_bb d0 d1 d2 d3
> 1 10 0 0 0
> 1 8 5 0 0
> 1 6 10 6 0
> 1 4 7 4 3
> 2 8 0 0 0
> 2 4 9 0 0
> 2 7 8 5 0
> 2 10 3 10 6
>
> id_bb = identifier, then some numeric variables d0-d3
> in reality i have more than 50000 lines, more than 3000 unique
> identifiers and it goes from d0 until d125 !
>
> now my dream is to have this:
>
> id_bb d0 d1 d2 d3 x
> 1 10 0 0 0 28
> 1 8 5 0 0 22
> 1 6 10 6 0 10
> 1 4 7 4 3 3
> 2 8 0 0 0 29
> 2 4 9 0 0 20
> 2 7 8 5 0 15
> 2 10 3 10 6 6
>
> this x variable is the trick sum i need
> and it goes like this:
> for the first id_bb:
> x in the first line = sum(d0), i.e. 10+8+6+4
> x in the second line = sum(d1), i.e. 0+5+10+7
> so on until the last id_bb=1, i.e. x = sum(d3) = 0+0+0+3
> then again for the second id_bb:
> x in the first line = sum(d0), ie. x = 8+4+7+10
> and so on until the last variable, ie. d3
>
> and at the end, in reality, this should happen for every id_bb (more
> than 3000 unique id_bb) and for the d0 until d125...
>
> i have no clue how to do it in SAS...
> in Excel this will surely be a pain as well and i don't know how to
> use VBA...
>
> So if someone could help me out with this, i would be very greatful !
> thank you in advance
> Best
> CG
>
From: ChrisG on
On 22 fév, 20:47, Nord...(a)DSHS.WA.GOV ("Nordlund, Dan (DSHS/RDA)")
wrote:
> > -----Original Message-----
> > From: SAS(r) Discussion [mailto:SA...(a)LISTSERV.UGA.EDU] On Behalf Of
> > ChrisG
> > Sent: Monday, February 22, 2010 8:07 AM
> > To: SA...(a)LISTSERV.UGA.EDU
> > Subject: tricky sum
>
> > Hi everyone
>
> > Here is my tricky sum problem
> > I have such a dataset
>
> > id_bb   d0      d1      d2      d3
> > 1       10      0       0       0
> > 1       8       5       0       0
> > 1       6       10      6       0
> > 1       4       7       4       3
> > 2       8       0       0       0
> > 2       4       9       0       0
> > 2       7       8       5       0
> > 2       10      3       10      6
>
> > id_bb = identifier, then some numeric variables d0-d3
> > in reality i have more than 50000 lines, more than 3000 unique
> > identifiers and it goes from d0 until d125 !
>
> > now my dream is to have this:
>
> > id_bb   d0      d1      d2      d3      x
> > 1       10      0       0       0       28
> > 1       8       5       0       0       22
> > 1       6       10      6       0       10
> > 1       4       7       4       3       3
> > 2       8       0       0       0       29
> > 2       4       9       0       0       20
> > 2       7       8       5       0       15
> > 2       10      3       10      6       6
>
> > this x variable is the trick sum i need
> > and it goes like this:
> > for the first id_bb:
> > x in the first line = sum(d0), i.e. 10+8+6+4
> > x in the second line = sum(d1), i.e. 0+5+10+7
> > so on until the last id_bb=1, i.e. x = sum(d3) = 0+0+0+3
> > then again for the second id_bb:
> > x in the first line = sum(d0), ie. x = 8+4+7+10
> > and so on until the last variable, ie. d3
>
> > and at the end, in reality, this should happen for every id_bb (more
> > than 3000 unique id_bb) and for the d0 until d125...
>
> > i have no clue how to do it in SAS...
> > in Excel this will surely be a pain as well and i don't know how to
> > use VBA...
>
> > So if someone could help me out with this, i would be very greatful !
> > thank you in advance
> > Best
> > CG
>
> Chris,
>
> I am not sure if I understand your wanted solution.  You say in your real data you have variables d0-d125 that you want to sum up and then append to consecutive observations by id_bb.  That will only work if you exactly 126 observations for each ID.  If that is what you have then the following code will work.  Change the indices of 3 to 125 for your data.
>
> data have;
> infile cards;
> input id_bb   d0      d1      d2      d3;
> cards;
> 1       10      0       0       0
> 1       8       5       0       0
> 1       6       10      6       0
> 1       4       7       4       3
> 2       8       0       0       0
> 2       4       9       0       0
> 2       7       8       5       0
> 2       10      3       10      6
> ;
> run;
> data want;
>   do until(last.id_bb);
>     set have;
>     by id_bb;
>     array d_a[0:3] d0-d3;
>     array sum_a[0:3] s0-s3;
>
>     do i=0 to 3;
>       sum_a[i] = sum(0,sum_a[i],d_a[i]);
>     end;
>   end;
>   do i=0 to 3;
>     set have;
>         x = sum_a[i];
>         output;
>   end;
> run;
> proc print;
> run;
>
> I you don't have 126 observations per ID then you will need to write back to SAS-L and give us more detail on exactly what you want.  Hope this is helpful,
>
> Dan
>
> Daniel J. Nordlund
> Washington State Department of Social and Health Services
> Planning, Performance, and Accountability
> Research and Data Analysis Division
> Olympia, WA  98504-5204

Hello everybody

thanks a lot for your help !
unfortunately, the real data may have less than 126 obs. per id...
but the code by Mike Zdeb works perfectly, so i found what i was
looking for
thanks to all of you
and thank you to Mike for his code

Best
CG