calculating variables across repeated ids [SAS]

Prev: calculating variables across repeated ids
Next: Check if a Variable is Null or not

From: Ya Huang on 8 Mar 2010 14:10

So you want to set the flag to missing is SEROTYPE is missing?
If so, a little change would do it:

data ALLFILLED;
input ID SAM SEROTYPE_1 SEROTYPE_9V;
cards;
1 0.2 8.22 36.1
1 0.3 1.81 22.1
1 1.0 .41 4.8
1 2.0 . 3.2
1 3.0 .30 2.1
1 4.0 .23 2.0
1 5.0 .10 1.9
1 6.0 .03 0.9
2 0.2 38.1 100.3
2 0.3 25.81 95.2
2 1.0 22.3 66.2
2 2.0 21.5 54.2
2 3.0 18.3 49.5
2 4.0 6.2 31.1
2 5.0 6.1 23.3
2 6.0 3.1 22
;
RUN;

proc sql;
select *, max(case when SAM=3 AND SEROTYPE_1>.35 THEN 1
when SAM=3 AND SEROTYPE_1<.35 THEN 0
else . end) + serotype_1 - serotype_1 as
CORR3_SER1
from ALLFILLED
group by id
order by id,sam
;

ID SAM SEROTYPE_1 SEROTYPE_9V CORR3_SER1
---------------------------------------------------
1 0.2 8.22 36.1 0
1 0.3 1.81 22.1 0
1 1 0.41 4.8 0
1 2 . 3.2 .
1 3 0.3 2.1 0
1 4 0.23 2 0
1 5 0.1 1.9 0
1 6 0.03 0.9 0
2 0.2 38.1 100.3 1
2 0.3 25.81 95.2 1
2 1 22.3 66.2 1
2 2 21.5 54.2 1
2 3 18.3 49.5 1
2 4 6.2 31.1 1
2 5 6.1 23.3 1
2 6 3.1 22 1

Note, I change obs=4 SEROTYPE_=. for testing.

On Mon, 8 Mar 2010 13:58:22 -0500, Katy Seib <uralva(a)GMAIL.COM> wrote:

>Very close, when I try to add a line to keep missing data =missing, I get
>the same results (CORR3_SER1=0 INSTEAD OF .). Lot to learn about sql. I
>added a third person below to simulate missing data (which there is plenty
>of). I'm sure it's a syntax problem on my end.....
>
>data ALLFILLED;
>input ID SAM SEROTYPE_1 SEROTYPE_9V;
>cards;
>1 0.2 8.22 36.1
>1 0.3 1.81 22.1
>1 1.0 .41 4.8
>1 2.0 .33 3.2
>1 3.0 .30 2.1
>1 4.0 .23 2.0
>1 5.0 .10 1.9
>1 6.0 .03 0.9
>2 0.2 38.1 100.3
>2 0.3 25.81 95.2
>2 1.0 22.3 66.2
>2 2.0 21.5 54.2
>2 3.0 18.3 49.5
>2 4.0 6.2 31.1
>2 5.0 6.1 23.3
>2 6.0 3.1 22
>3 0.2 38.1 100.3
>3 0.3 25.81 95.2
>3 1.0 22.3 66.2
>3 2.0 21.5 54.2
>3 3.0 . .
>3 4.0 6.2 31.1
>3 5.0 6.1 23.3
>3 6.0 3.1 22
>;
>RUN;
>proc sql;
>select *, max(case when SAM=3 AND SEROTYPE_1>.35 THEN 1
> when SAM=3 AND SEROTYPE_1<.35 THEN 0
> when sam=3 and serotype_1=. THEN .
> else . end) as CORR3_SER1
>from ALLFILLED
>group by id
>order by id,sam
>;
>
>
>
>On Mon, Mar 8, 2010 at 12:14 PM, Katy Seib <uralva(a)gmail.com> wrote:
>
>> **Intro
>> First, thanks To Ron for posting the SAS-L wiki this morning- very
helpful!
>> I too am relatively new to posting at the listserv but have enjoyed
>> reading
>> for a while. I'm a guest researcher at CDC and a grad student finishing
up
>> my thesis which is a longitudinal study - much of which is beyond the
scope
>> of my formal SAS education - I've had fun finding resources like this,
UCLA
>> and various SAS tips across the internet. I think my question today is
>> relatively simple but a problem for me nonetheless.
>> **Problem
>> I have been calculating variables and have a LONG but effective algorithm
>> for "filling in" variables that are constant across an observation. It's
>> tedious and I still worry about errors. I know there must be a simpler
way
>> to fill in this information AS it's calculated. Below I have included an
>> example and sample data. The SAM variable is a crude time variable and
in
>> reality there are 23 serotypes. The data step "correlate" gives me the
>> yes/no answer for sam=3 only but I need it for all sam (the question
being
>> answered by calculating this variable is: "did this infant have a
antibody
>> titer greater than .35 at 10 weeks (sam=3)?". Here I've done it for
>> Serotype_1 but will have to do it for all serotypes. I'm not pasting my
>> fill
>> in code because it's long and clumsy requiring 6 data steps. This is not
a
>> first occurrence var, so I'm having trouble applying previous similar
>> solutions. Thanks all.
>>
>> data ALLFILLED;
>> input ID SAM SEROTYPE_1 SEROTYPE_9V;
>> cards;
>> 1 0.2 8.22 36.1
>> 1 0.3 1.81 22.1
>> 1 1.0 .41 4.8
>> 1 2.0 .33 3.2
>> 1 3.0 .30 2.1
>> 1 4.0 .23 2.0
>> 1 5.0 .10 1.9
>> 1 6.0 .03 0.9
>> 2 0.2 38.1 100.3
>> 2 0.3 25.81 95.2
>> 2 1.0 22.3 66.2
>> 2 2.0 21.5 54.2
>> 2 3.0 18.3 49.5
>> 2 4.0 6.2 31.1
>> 2 5.0 6.1 23.3
>> 2 6.0 3.1 22
>> ;
>> RUN;
>>
>> PROC PRINT DATA=ALLFILLED; RUN;
>>
>> DATA CORRELATE;
>> SET ALLFILLED;
>> IF SAM=3 AND SEROTYPE_1=. THEN CORR3_SER1=.;
>> ELSE IF SAM=3 AND SEROTYPE_1>.35 THEN CORR3_SER1=1;
>> ELSE IF SAM=3 AND SEROTYPE_1<.35 THEN CORR3_SER1=0;
>> BY ID;
>> RUN;
>>
>> PROC PRINT DATA=CORRELATE; RUN;
>>

From: Katy Seib on 8 Mar 2010 14:27

Oops, meant to include sas-l....

> Ok this code gives me this output for id=3D3 (others are correct) (i'd tw=
eak
> it myself if I had the vaguest clue about sql):
>
>
>
> ID SAM SEROTYPE_1 SEROTYPE_9V CORR3_SER1
> =C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=
=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=
=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=
=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=
=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92
>
> 3 0.2 38.1 100.3 0
> 3 0.3 25.81 95.2 0
> 3 1 22.3 66.2 0
> 3 2 21.5 54.2 0
> 3 3 . . .
> 3 4 6.2 31.1 0
> 3 5 6.1 23.3 0
> 3 6 3.1 22 0
>
> when it should be
>
>
> ID SAM SEROTYPE_1 SEROTYPE_9V CORR3_SER1
> =C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=
=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=
=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=
=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=
=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92
>
> 3 0.2 38.1 100.3 .
> 3 0.3 25.81 95.2 .
> 3 1 22.3 66.2 .
> 3 2 21.5 54.2 .
> 3 3 .
> . .
> 3 4 6.2 31.1 .
> 3 5 6.1 23.3 .
> 3 6 3.1 22 .
>
>
>
>
> On Mon, Mar 8, 2010 at 2:10 PM, Ya Huang <ya.huang(a)amylin.com> wrote:
>
>> So you want to set the flag to missing is SEROTYPE is missing?
>> If so, a little change would do it:
>>
>> data ALLFILLED;
>> input ID SAM SEROTYPE_1 SEROTYPE_9V;
>> cards;
>> 1 0.2 8.22 36.1
>> 1 0.3 1.81 22.1
>> 1 1.0 .41 4.8
>> 1 2.0 . 3.2
>> 1 3.0 .30 2.1
>> 1 4.0 .23 2.0
>> 1 5.0 .10 1.9
>> 1 6.0 .03 0.9
>> 2 0.2 38.1 100.3
>> 2 0.3 25.81 95.2
>> 2 1.0 22.3 66.2
>> 2 2.0 21.5 54.2
>> 2 3.0 18.3 49.5
>> 2 4.0 6.2 31.1
>> 2 5.0 6.1 23.3
>> 2 6.0 3.1 22
>> ;
>> RUN;
>>
>>
>> proc sql;
>> select *, max(case when SAM=3D3 AND SEROTYPE_1>.35 THEN 1
>> when SAM=3D3 AND SEROTYPE_1<.35 THEN 0
>> else . end) + serotype_1 - serotype_1 as
>> CORR3_SER1
>> from ALLFILLED
>> group by id
>> order by id,sam
>> ;
>>
>> ID SAM SEROTYPE_1 SEROTYPE_9V CORR3_SER1
>> ---------------------------------------------------
>> 1 0.2 8.22 36.1 0
>> 1 0.3 1.81 22.1 0
>> 1 1 0.41 4.8 0
>> 1 2 . 3.2 .
>> 1 3 0.3 2.1 0
>> 1 4 0.23 2 0
>> 1 5 0.1 1.9 0
>> 1 6 0.03 0.9 0
>> 2 0.2 38.1 100.3 1
>> 2 0.3 25.81 95.2 1
>> 2 1 22.3 66.2 1
>> 2 2 21.5 54.2 1
>> 2 3 18.3 49.5 1
>> 2 4 6.2 31.1 1
>> 2 5 6.1 23.3 1
>> 2 6 3.1 22 1
>>
>> Note, I change obs=3D4 SEROTYPE_=3D. for testing.
>>
>>
>>
>> On Mon, 8 Mar 2010 13:58:22 -0500, Katy Seib <uralva(a)GMAIL.COM> wrote:
>>
>> >Very close, when I try to add a line to keep missing data =3Dmissing, I=
get
>> >the same results (CORR3_SER1=3D0 INSTEAD OF .). Lot to learn about sql=
.. I
>> >added a third person below to simulate missing data (which there is
>> plenty
>> >of). I'm sure it's a syntax problem on my end.....
>> >
>> >data ALLFILLED;
>> >input ID SAM SEROTYPE_1 SEROTYPE_9V;
>> >cards;
>> >1 0.2 8.22 36.1
>> >1 0.3 1.81 22.1
>> >1 1.0 .41 4.8
>> >1 2.0 .33 3.2
>> >1 3.0 .30 2.1
>> >1 4.0 .23 2.0
>> >1 5.0 .10 1.9
>> >1 6.0 .03 0.9
>> >2 0.2 38.1 100.3
>> >2 0.3 25.81 95.2
>> >2 1.0 22.3 66.2
>> >2 2.0 21.5 54.2
>> >2 3.0 18.3 49.5
>> >2 4.0 6.2 31.1
>> >2 5.0 6.1 23.3
>> >2 6.0 3.1 22
>> >3 0.2 38.1 100.3
>> >3 0.3 25.81 95.2
>> >3 1.0 22.3 66.2
>> >3 2.0 21.5 54.2
>> >3 3.0 . .
>> >3 4.0 6.2 31.1
>> >3 5.0 6.1 23.3
>> >3 6.0 3.1 22
>> >;
>> >RUN;
>> >proc sql;
>> >select *, max(case when SAM=3D3 AND SEROTYPE_1>.35 THEN 1
>> > when SAM=3D3 AND SEROTYPE_1<.35 THEN 0
>> > when sam=3D3 and serotype_1=3D. THEN .
>> > else . end) as CORR3_SER1
>> >from ALLFILLED
>> >group by id
>> >order by id,sam
>> >;
>> >
>> >
>> >
>> >On Mon, Mar 8, 2010 at 12:14 PM, Katy Seib <uralva(a)gmail.com> wrote:
>> >
>> >> **Intro
>> >> First, thanks To Ron for posting the SAS-L wiki this morning- very
>> helpful!
>> >> I too am relatively new to posting at the listserv but have enjoyed
>> >> reading
>> >> for a while. I'm a guest researcher at CDC and a grad student
>> finishing
>> up
>> >> my thesis which is a longitudinal study - much of which is beyond the
>> scope
>> >> of my formal SAS education - I've had fun finding resources like this=
,
>> UCLA
>> >> and various SAS tips across the internet. I think my question today =
is
>> >> relatively simple but a problem for me nonetheless.
>> >> **Problem
>> >> I have been calculating variables and have a LONG but effective
>> algorithm
>> >> for "filling in" variables that are constant across an observation.
>> It's
>> >> tedious and I still worry about errors. I know there must be a simpl=
er
>> way
>> >> to fill in this information AS it's calculated. Below I have included
>> an
>> >> example and sample data. The SAM variable is a crude time variable a=
nd
>> in
>> >> reality there are 23 serotypes. The data step "correlate" gives me th=
e
>> >> yes/no answer for sam=3D3 only but I need it for all sam (the questio=
n
>> being
>> >> answered by calculating this variable is: "did this infant have a
>> antibody
>> >> titer greater than .35 at 10 weeks (sam=3D3)?". Here I've done it for
>> >> Serotype_1 but will have to do it for all serotypes. I'm not pasting =
my
>> >> fill
>> >> in code because it's long and clumsy requiring 6 data steps. This is
>> not
>> a
>> >> first occurrence var, so I'm having trouble applying previous similar
>> >> solutions. Thanks all.
>> >>
>> >> data ALLFILLED;
>> >> input ID SAM SEROTYPE_1 SEROTYPE_9V;
>> >> cards;
>> >> 1 0.2 8.22 36.1
>> >> 1 0.3 1.81 22.1
>> >> 1 1.0 .41 4.8
>> >> 1 2.0 .33 3.2
>> >> 1 3.0 .30 2.1
>> >> 1 4.0 .23 2.0
>> >> 1 5.0 .10 1.9
>> >> 1 6.0 .03 0.9
>> >> 2 0.2 38.1 100.3
>> >> 2 0.3 25.81 95.2
>> >> 2 1.0 22.3 66.2
>> >> 2 2.0 21.5 54.2
>> >> 2 3.0 18.3 49.5
>> >> 2 4.0 6.2 31.1
>> >> 2 5.0 6.1 23.3
>> >> 2 6.0 3.1 22
>> >> ;
>> >> RUN;
>> >>
>> >> PROC PRINT DATA=3DALLFILLED; RUN;
>> >>
>> >> DATA CORRELATE;
>> >> SET ALLFILLED;
>> >> IF SAM=3D3 AND SEROTYPE_1=3D. THEN CORR3_SER1=3D.;
>> >> ELSE IF SAM=3D3 AND SEROTYPE_1>.35 THEN CORR3_SER1=3D1;
>> >> ELSE IF SAM=3D3 AND SEROTYPE_1<.35 THEN CORR3_SER1=3D0;
>> >> BY ID;
>> >> RUN;
>> >>
>> >> PROC PRINT DATA=3DCORRELATE; RUN;
>> >>
>>
>
>

From: Ya Huang on 8 Mar 2010 14:57

This one seems to work:

proc sql;
select *, max(case when SAM=3 AND SEROTYPE_1>.35 THEN 1
when SAM=3 AND SEROTYPE_1<.35 THEN 0
else . end) +
max(case when SAM=3 AND ^missing(SEROTYPE_1) THEN 0 else .
end) -
max(case when SAM=3 AND ^missing(SEROTYPE_1) THEN 0 else . end) as
CORR3_SER1
from ALLFILLED
group by id
order by id,sam
;

ID SAM SEROTYPE_1 SEROTYPE_9V CORR3_SER1
----------------------------------------------------
1 0.2 8.22 36.1 .
1 0.3 1.81 22.1 .
1 1 0.41 4.8 .
1 2 0.35 3.2 .
1 3 . 2.1 .
1 4 0.23 2 .
1 5 0.1 1.9 .
1 6 0.03 0.9 .
2 0.2 38.1 100.3 1
2 0.3 25.81 95.2 1
2 1 22.3 66.2 1
2 2 21.5 54.2 1
2 3 18.3 49.5 1
2 4 6.2 31.1 1
2 5 6.1 23.3 1
2 6 3.1 22 1
3 0.2 8.22 36.1 0
3 0.3 1.81 22.1 0
3 1 0.41 4.8 0
3 2 0.35 3.2 0
3 3 0.2 2.1 0
3 4 0.23 2 0
3 5 0.1 1.9 0
3 6 0.03 0.9 0

On Mon, 8 Mar 2010 14:27:42 -0500, Katy Seib <uralva(a)GMAIL.COM> wrote:

>Oops, meant to include sas-l....
>
>> Ok this code gives me this output for id=3 (others are correct) (i'd
tweak
>> it myself if I had the vaguest clue about sql):
>>
>>
>>
>> ID SAM SEROTYPE_1 SEROTYPE_9V CORR3_SER1
>> ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
>>
>> 3 0.2 38.1 100.3 0
>> 3 0.3 25.81 95.2 0
>> 3 1 22.3 66.2 0
>> 3 2 21.5 54.2 0
>> 3 3 . . .
>> 3 4 6.2 31.1 0
>> 3 5 6.1 23.3 0
>> 3 6 3.1 22 0
>>
>> when it should be
>>
>>
>> ID SAM SEROTYPE_1 SEROTYPE_9V CORR3_SER1
>> ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
>>
>> 3 0.2 38.1 100.3 .
>> 3 0.3 25.81 95.2 .
>> 3 1 22.3 66.2 .
>> 3 2 21.5 54.2 .
>> 3 3 .
>> . .
>> 3 4 6.2 31.1 .
>> 3 5 6.1 23.3 .
>> 3 6 3.1 22 .
>>
>>
>>
>>
>> On Mon, Mar 8, 2010 at 2:10 PM, Ya Huang <ya.huang(a)amylin.com> wrote:
>>
>>> So you want to set the flag to missing is SEROTYPE is missing?
>>> If so, a little change would do it:
>>>
>>> data ALLFILLED;
>>> input ID SAM SEROTYPE_1 SEROTYPE_9V;
>>> cards;
>>> 1 0.2 8.22 36.1
>>> 1 0.3 1.81 22.1
>>> 1 1.0 .41 4.8
>>> 1 2.0 . 3.2
>>> 1 3.0 .30 2.1
>>> 1 4.0 .23 2.0
>>> 1 5.0 .10 1.9
>>> 1 6.0 .03 0.9
>>> 2 0.2 38.1 100.3
>>> 2 0.3 25.81 95.2
>>> 2 1.0 22.3 66.2
>>> 2 2.0 21.5 54.2
>>> 2 3.0 18.3 49.5
>>> 2 4.0 6.2 31.1
>>> 2 5.0 6.1 23.3
>>> 2 6.0 3.1 22
>>> ;
>>> RUN;
>>>
>>>
>>> proc sql;
>>> select *, max(case when SAM=3 AND SEROTYPE_1>.35 THEN 1
>>> when SAM=3 AND SEROTYPE_1<.35 THEN 0
>>> else . end) + serotype_1 - serotype_1 as
>>> CORR3_SER1
>>> from ALLFILLED
>>> group by id
>>> order by id,sam
>>> ;
>>>
>>> ID SAM SEROTYPE_1 SEROTYPE_9V CORR3_SER1
>>> ---------------------------------------------------
>>> 1 0.2 8.22 36.1 0
>>> 1 0.3 1.81 22.1 0
>>> 1 1 0.41 4.8 0
>>> 1 2 . 3.2 .
>>> 1 3 0.3 2.1 0
>>> 1 4 0.23 2 0
>>> 1 5 0.1 1.9 0
>>> 1 6 0.03 0.9 0
>>> 2 0.2 38.1 100.3 1
>>> 2 0.3 25.81 95.2 1
>>> 2 1 22.3 66.2 1
>>> 2 2 21.5 54.2 1
>>> 2 3 18.3 49.5 1
>>> 2 4 6.2 31.1 1
>>> 2 5 6.1 23.3 1
>>> 2 6 3.1 22 1
>>>
>>> Note, I change obs=4 SEROTYPE_=. for testing.
>>>
>>>
>>>
>>> On Mon, 8 Mar 2010 13:58:22 -0500, Katy Seib <uralva(a)GMAIL.COM> wrote:
>>>
>>> >Very close, when I try to add a line to keep missing data =missing, I
get
>>> >the same results (CORR3_SER1=0 INSTEAD OF .). Lot to learn about
sql. I
>>> >added a third person below to simulate missing data (which there is
>>> plenty
>>> >of). I'm sure it's a syntax problem on my end.....
>>> >
>>> >data ALLFILLED;
>>> >input ID SAM SEROTYPE_1 SEROTYPE_9V;
>>> >cards;
>>> >1 0.2 8.22 36.1
>>> >1 0.3 1.81 22.1
>>> >1 1.0 .41 4.8
>>> >1 2.0 .33 3.2
>>> >1 3.0 .30 2.1
>>> >1 4.0 .23 2.0
>>> >1 5.0 .10 1.9
>>> >1 6.0 .03 0.9
>>> >2 0.2 38.1 100.3
>>> >2 0.3 25.81 95.2
>>> >2 1.0 22.3 66.2
>>> >2 2.0 21.5 54.2
>>> >2 3.0 18.3 49.5
>>> >2 4.0 6.2 31.1
>>> >2 5.0 6.1 23.3
>>> >2 6.0 3.1 22
>>> >3 0.2 38.1 100.3
>>> >3 0.3 25.81 95.2
>>> >3 1.0 22.3 66.2
>>> >3 2.0 21.5 54.2
>>> >3 3.0 . .
>>> >3 4.0 6.2 31.1
>>> >3 5.0 6.1 23.3
>>> >3 6.0 3.1 22
>>> >;
>>> >RUN;
>>> >proc sql;
>>> >select *, max(case when SAM=3 AND SEROTYPE_1>.35 THEN 1
>>> > when SAM=3 AND SEROTYPE_1<.35 THEN 0
>>> > when sam=3 and serotype_1=. THEN .
>>> > else . end) as CORR3_SER1
>>> >from ALLFILLED
>>> >group by id
>>> >order by id,sam
>>> >;
>>> >
>>> >
>>> >
>>> >On Mon, Mar 8, 2010 at 12:14 PM, Katy Seib <uralva(a)gmail.com> wrote:
>>> >
>>> >> **Intro
>>> >> First, thanks To Ron for posting the SAS-L wiki this morning- very
>>> helpful!
>>> >> I too am relatively new to posting at the listserv but have enjoyed
>>> >> reading
>>> >> for a while. I'm a guest researcher at CDC and a grad student
>>> finishing
>>> up
>>> >> my thesis which is a longitudinal study - much of which is beyond the
>>> scope
>>> >> of my formal SAS education - I've had fun finding resources like
this,
>>> UCLA
>>> >> and various SAS tips across the internet. I think my question today
is
>>> >> relatively simple but a problem for me nonetheless.
>>> >> **Problem
>>> >> I have been calculating variables and have a LONG but effective
>>> algorithm
>>> >> for "filling in" variables that are constant across an observation.
>>> It's
>>> >> tedious and I still worry about errors. I know there must be a
simpler
>>> way
>>> >> to fill in this information AS it's calculated. Below I have included
>>> an
>>> >> example and sample data. The SAM variable is a crude time variable
and
>>> in
>>> >> reality there are 23 serotypes. The data step "correlate" gives me
the
>>> >> yes/no answer for sam=3 only but I need it for all sam (the question
>>> being
>>> >> answered by calculating this variable is: "did this infant have a
>>> antibody
>>> >> titer greater than .35 at 10 weeks (sam=3)?". Here I've done it for
>>> >> Serotype_1 but will have to do it for all serotypes. I'm not pasting
my
>>> >> fill
>>> >> in code because it's long and clumsy requiring 6 data steps. This is
>>> not
>>> a
>>> >> first occurrence var, so I'm having trouble applying previous similar
>>> >> solutions. Thanks all.
>>> >>
>>> >> data ALLFILLED;
>>> >> input ID SAM SEROTYPE_1 SEROTYPE_9V;
>>> >> cards;
>>> >> 1 0.2 8.22 36.1
>>> >> 1 0.3 1.81 22.1
>>> >> 1 1.0 .41 4.8
>>> >> 1 2.0 .33 3.2
>>> >> 1 3.0 .30 2.1
>>> >> 1 4.0 .23 2.0
>>> >> 1 5.0 .10 1.9
>>> >> 1 6.0 .03 0.9
>>> >> 2 0.2 38.1 100.3
>>> >> 2 0.3 25.81 95.2
>>> >> 2 1.0 22.3 66.2
>>> >> 2 2.0 21.5 54.2
>>> >> 2 3.0 18.3 49.5
>>> >> 2 4.0 6.2 31.1
>>> >> 2 5.0 6.1 23.3
>>> >> 2 6.0 3.1 22
>>> >> ;
>>> >> RUN;
>>> >>
>>> >> PROC PRINT DATA=ALLFILLED; RUN;
>>> >>
>>> >> DATA CORRELATE;
>>> >> SET ALLFILLED;
>>> >> IF SAM=3 AND SEROTYPE_1=. THEN CORR3_SER1=.;
>>> >> ELSE IF SAM=3 AND SEROTYPE_1>.35 THEN CORR3_SER1=1;
>>> >> ELSE IF SAM=3 AND SEROTYPE_1<.35 THEN CORR3_SER1=0;
>>> >> BY ID;
>>> >> RUN;
>>> >>
>>> >> PROC PRINT DATA=CORRELATE; RUN;
>>> >>
>>>
>>
>>

From: Muthia Kachirayan on 8 Mar 2010 15:56

Based on the output of Ya, I guess you can as well do by a data step using
double Dow-loop. As you are new user, read a paper on DOW. Search for Paul
Dorfman.

data need;
do until(last.id);
set allfilled;
by id ;
if sam =3D 3 and serotype_1 =3D . then corr3_ser1 =3D .;
else if sam =3D 3 and serotype_1 > 0.35 then corr3_ser1 =3D 1;
else if sam =3D 3 and serotype_1 <=3D 0.35 then corr3_ser1 =3D 0;
end;
do until(last.id);
set allfilled;
by id;
output;
end;
run;

On Mon, Mar 8, 2010 at 3:27 PM, Katy Seib <uralva(a)gmail.com> wrote:

> Oops, meant to include sas-l....
>
> > Ok this code gives me this output for id=3D3 (others are correct) (i'd
> tweak
> > it myself if I had the vaguest clue about sql):
> >
> >
> >
> > ID SAM SEROTYPE_1 SEROTYPE_9V CORR3_SER=
1
> > =C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=
=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=
=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=
=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=
=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92
> >
> > 3 0.2 38.1 100.3 =
0
> > 3 0.3 25.81 95.2 =
0
> > 3 1 22.3 66.2 =
0
> > 3 2 21.5 54.2 =
0
> > 3 3 . . =
..
> > 3 4 6.2 31.1 =
0
> > 3 5 6.1 23.3 =
0
> > 3 6 3.1 22 =
0
> >
> > when it should be
> >
> >
> > ID SAM SEROTYPE_1 SEROTYPE_9V CORR3_SER=
1
> > =C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=
=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=
=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=
=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=
=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92=C6=92
> >
> > 3 0.2 38.1 100.3 =
..
> > 3 0.3 25.81 95.2 =
..
> > 3 1 22.3 66.2 =
..
> > 3 2 21.5 54.2 =
..
> > 3 3 .
> > . .
> > 3 4 6.2 31.1 =
..
> > 3 5 6.1 23.3 =
..
> > 3 6 3.1 22 =
..
> >
> >
> >
> >
> > On Mon, Mar 8, 2010 at 2:10 PM, Ya Huang <ya.huang(a)amylin.com> wrote:
> >
> >> So you want to set the flag to missing is SEROTYPE is missing?
> >> If so, a little change would do it:
> >>
> >> data ALLFILLED;
> >> input ID SAM SEROTYPE_1 SEROTYPE_9V;
> >> cards;
> >> 1 0.2 8.22 36.1
> >> 1 0.3 1.81 22.1
> >> 1 1.0 .41 4.8
> >> 1 2.0 . 3.2
> >> 1 3.0 .30 2.1
> >> 1 4.0 .23 2.0
> >> 1 5.0 .10 1.9
> >> 1 6.0 .03 0.9
> >> 2 0.2 38.1 100.3
> >> 2 0.3 25.81 95.2
> >> 2 1.0 22.3 66.2
> >> 2 2.0 21.5 54.2
> >> 2 3.0 18.3 49.5
> >> 2 4.0 6.2 31.1
> >> 2 5.0 6.1 23.3
> >> 2 6.0 3.1 22
> >> ;
> >> RUN;
> >>
> >>
> >> proc sql;
> >> select *, max(case when SAM=3D3 AND SEROTYPE_1>.35 THEN 1
> >> when SAM=3D3 AND SEROTYPE_1<.35 THEN 0
> >> else . end) + serotype_1 - serotype_1 as
> >> CORR3_SER1
> >> from ALLFILLED
> >> group by id
> >> order by id,sam
> >> ;
> >>
> >> ID SAM SEROTYPE_1 SEROTYPE_9V CORR3_SER1
> >> ---------------------------------------------------
> >> 1 0.2 8.22 36.1 0
> >> 1 0.3 1.81 22.1 0
> >> 1 1 0.41 4.8 0
> >> 1 2 . 3.2 .
> >> 1 3 0.3 2.1 0
> >> 1 4 0.23 2 0
> >> 1 5 0.1 1.9 0
> >> 1 6 0.03 0.9 0
> >> 2 0.2 38.1 100.3 1
> >> 2 0.3 25.81 95.2 1
> >> 2 1 22.3 66.2 1
> >> 2 2 21.5 54.2 1
> >> 2 3 18.3 49.5 1
> >> 2 4 6.2 31.1 1
> >> 2 5 6.1 23.3 1
> >> 2 6 3.1 22 1
> >>
> >> Note, I change obs=3D4 SEROTYPE_=3D. for testing.
> >>
> >>
> >>
> >> On Mon, 8 Mar 2010 13:58:22 -0500, Katy Seib <uralva(a)GMAIL.COM> wrote:
> >>
> >> >Very close, when I try to add a line to keep missing data =3Dmissing,=
I
> get
> >> >the same results (CORR3_SER1=3D0 INSTEAD OF .). Lot to learn about s=
ql.
> I
> >> >added a third person below to simulate missing data (which there is
> >> plenty
> >> >of). I'm sure it's a syntax problem on my end.....
> >> >
> >> >data ALLFILLED;
> >> >input ID SAM SEROTYPE_1 SEROTYPE_9V;
> >> >cards;
> >> >1 0.2 8.22 36.1
> >> >1 0.3 1.81 22.1
> >> >1 1.0 .41 4.8
> >> >1 2.0 .33 3.2
> >> >1 3.0 .30 2.1
> >> >1 4.0 .23 2.0
> >> >1 5.0 .10 1.9
> >> >1 6.0 .03 0.9
> >> >2 0.2 38.1 100.3
> >> >2 0.3 25.81 95.2
> >> >2 1.0 22.3 66.2
> >> >2 2.0 21.5 54.2
> >> >2 3.0 18.3 49.5
> >> >2 4.0 6.2 31.1
> >> >2 5.0 6.1 23.3
> >> >2 6.0 3.1 22
> >> >3 0.2 38.1 100.3
> >> >3 0.3 25.81 95.2
> >> >3 1.0 22.3 66.2
> >> >3 2.0 21.5 54.2
> >> >3 3.0 . .
> >> >3 4.0 6.2 31.1
> >> >3 5.0 6.1 23.3
> >> >3 6.0 3.1 22
> >> >;
> >> >RUN;
> >> >proc sql;
> >> >select *, max(case when SAM=3D3 AND SEROTYPE_1>.35 THEN 1
> >> > when SAM=3D3 AND SEROTYPE_1<.35 THEN 0
> >> > when sam=3D3 and serotype_1=3D. THEN .
> >> > else . end) as CORR3_SER1
> >> >from ALLFILLED
> >> >group by id
> >> >order by id,sam
> >> >;
> >> >
> >> >
> >> >
> >> >On Mon, Mar 8, 2010 at 12:14 PM, Katy Seib <uralva(a)gmail.com> wrote:
> >> >
> >> >> **Intro
> >> >> First, thanks To Ron for posting the SAS-L wiki this morning- very
> >> helpful!
> >> >> I too am relatively new to posting at the listserv but have enjoye=
d
> >> >> reading
> >> >> for a while. I'm a guest researcher at CDC and a grad student
> >> finishing
> >> up
> >> >> my thesis which is a longitudinal study - much of which is beyond t=
he
> >> scope
> >> >> of my formal SAS education - I've had fun finding resources like
> this,
> >> UCLA
> >> >> and various SAS tips across the internet. I think my question toda=
y
> is
> >> >> relatively simple but a problem for me nonetheless.
> >> >> **Problem
> >> >> I have been calculating variables and have a LONG but effective
> >> algorithm
> >> >> for "filling in" variables that are constant across an observation.
> >> It's
> >> >> tedious and I still worry about errors. I know there must be a
> simpler
> >> way
> >> >> to fill in this information AS it's calculated. Below I have includ=
ed
> >> an
> >> >> example and sample data. The SAM variable is a crude time variable
> and
> >> in
> >> >> reality there are 23 serotypes. The data step "correlate" gives me
> the
> >> >> yes/no answer for sam=3D3 only but I need it for all sam (the quest=
ion
> >> being
> >> >> answered by calculating this variable is: "did this infant have a
> >> antibody
> >> >> titer greater than .35 at 10 weeks (sam=3D3)?". Here I've done it f=
or
> >> >> Serotype_1 but will have to do it for all serotypes. I'm not pastin=
g
> my
> >> >> fill
> >> >> in code because it's long and clumsy requiring 6 data steps. This i=
s
> >> not
> >> a
> >> >> first occurrence var, so I'm having trouble applying previous simil=
ar
> >> >> solutions. Thanks all.
> >> >>
> >> >> data ALLFILLED;
> >> >> input ID SAM SEROTYPE_1 SEROTYPE_9V;
> >> >> cards;
> >> >> 1 0.2 8.22 36.1
> >> >> 1 0.3 1.81 22.1
> >> >> 1 1.0 .41 4.8
> >> >> 1 2.0 .33 3.2
> >> >> 1 3.0 .30 2.1
> >> >> 1 4.0 .23 2.0
> >> >> 1 5.0 .10 1.9
> >> >> 1 6.0 .03 0.9
> >> >> 2 0.2 38.1 100.3
> >> >> 2 0.3 25.81 95.2
> >> >> 2 1.0 22.3 66.2
> >> >> 2 2.0 21.5 54.2
> >> >> 2 3.0 18.3 49.5
> >> >> 2 4.0 6.2 31.1
> >> >> 2 5.0 6.1 23.3
> >> >> 2 6.0 3.1 22
> >> >> ;
> >> >> RUN;
> >> >>
> >> >> PROC PRINT DATA=3DALLFILLED; RUN;
> >> >>
> >> >> DATA CORRELATE;
> >> >> SET ALLFILLED;
> >> >> IF SAM=3D3 AND SEROTYPE_1=3D. THEN CORR3_SER1=3D.;
> >> >> ELSE IF SAM=3D3 AND SEROTYPE_1>.35 THEN CORR3_SER1=3D1;
> >> >> ELSE IF SAM=3D3 AND SEROTYPE_1<.35 THEN CORR3_SER1=3D0;
> >> >> BY ID;
> >> >> RUN;
> >> >>
> >> >> PROC PRINT DATA=3DCORRELATE; RUN;
> >> >>
> >>
> >
> >
>

First | Prev |
Pages: 1 2
Prev: calculating variables across repeated ids
Next: Check if a Variable is Null or not