data manipulation problem [SAS]

Prev: Quality of logistic regression model
Next: Proc gplot annotate question

From: Akshaya on 22 Oct 2009 15:53

Here's a datastep solution, might need changes for your real data:
Data have;
input id y;
_dif=y-0.5;
_g=(y<0.5);
cards;
1 1
1 1
1 1
1 0.8
1 0.6
1 0.6
1 0.4
1 0.2
2 1
2 1
2 0.4
2 0
3 1
3 1
3 0.8
3 0.8
;

Proc sql;
create table have1 as
select *,( max(_g)=0 ) as _gg
from have(where=(y^=0))
group by id
order by id,_g,y;
Quit;

Data want(drop=_:);
set have1;
by id _g y;
if ^_g then _d+first.y-_d*first._g;
else _d=0;
if (^_g and _d=1) or (_g and last._g) or (_gg and _d in (1,2)) then
output;
Run;

AkshayA!

On Thu, Oct 22, 2009 at 2:15 PM, olivesecret <olivesecret(a)gmail.com> wrote:

> I have a large data set consisting of subject id, response y and other
> interesting variables. A subset of data is like this:
>
> ID Y ...
> 1 1
> 1 1
> 1 1
> 1 0.8
> 1 0.6
> 1 0.6
> 1 0.4
> 1 0.2
> 2 1
> 2 1
> 2 0.4
> 2 0
> 3 1
> 3 1
> 3 0.8
> 3 0.8
> 4 1
> ...
>
> What I need do is for each ID, find the two observations, with one
> having y immediately larger than 0.5 and the other having y
> immediately smaller 0.5. For the example above, then the observations
> needed for ID=1 are ID=1 y=0.6 and ID=1 y=0.4, and the observations
> needed for ID=2 are ID=2 y=1 and ID=2 y=0.4. For ID=3, since there are
> no observations where y is less than 0.5, then I need the the two obs
> which having y immediately larger than 0.5, which are ID=3 y=1 and
> ID=3 y=0.8.
> Any hints?
> Thanks a lot!
>

First | Prev |
Pages: 1 2
Prev: Quality of logistic regression model
Next: Proc gplot annotate question