Populating variables [SAS]

Prev: Unable to Repair Large Data Set
Next: SAS Proc Qlim question

From: Paul Dorfman on 5 Oct 2009 10:38

Randy,

If the arrangement you have shown, i.e. that VARA has a non-missing value
on the first record in the file sorted by ID DATE, is guaranteed, then you
can simply do the following, where use is being made of the auto-retained
and auto-droppable numeric auto-variable _iorc_:

data have ;
input date: mmddyy8. id vara ;
cards ;
06/01/09 1 1402
06/01/09 1 .
06/01/09 1 .
06/02/09 1 1543
06/02/09 1 .
06/01/09 2 1602
06/01/09 2 .
06/01/09 2 .
06/02/09 2 1755
06/02/09 2 .
;
run ;

data need1 ;
set have ;
by id date ;
if first.date then _iorc_ = vara ;
else vara = _iorc_ ;
run ;

However, "guaranteed" is a dangerous assumption. Even though the order by
key variables is guaranteed by means of pre-sorting, one invades a
precarious territory by assuming a certain order within the by-groups. For
example, for a variety of reasons it may so happen that in a file sorted
by ID DATE, the non-missing VARA value does not fall on the first record
in the group.

In such cases, it is more robust to process each group twice to guarantee
that the condition is met on the first pass and create the needed output
on the second. For example, this will work against HAVE sorted by ID DATE
regardless where the non-missing VARA value is located within the by-group:

data need2 (drop = _:) ;
do _n_ = 1 by 1 until (last.date) ;
set have ;
by id date ;
if not missing (vara) then _nvara = vara ;
end ;
do _n_ = 1 to _n_ ;
set have ;
vara = _nvara ;
output ;
end ;
run ;

Or, even more concisely, you can resort to SQL which does not (and cannot)
rely on input data ordering, instead presuming that for each ID DATE
group, you need the maximal VARA value in the group:

proc sql ;
create table need3 (drop = _:) as
select id
, date
, max (vara) as vara
, vara as _vara
from have
group id, date
order id, date
;
quit ;

Of course, SQL also processes each by-group twice, albeit via a different
internal mechanism and behind the scenes.

Kind regards
------------
Paul Dorfman
Jax, FL
------------

On Mon, 5 Oct 2009 00:19:20 -0400, Randy <randistan69(a)HOTMAIL.COM> wrote:

>Dear Joe:
>Slight variation of the problem:
>
>Date ID VarA
>06/01/09 1 1402
>06/01/09 1 .
>06/01/09 1 .
>06/02/09 1 1543
>06/02/09 1 .
>06/01/09 2 1602
>06/01/09 2 .
>06/01/09 2 .
>06/02/09 2 1755
>06/02/09 2 .
>
>The data set should look like this
>Date ID VarA
>06/01/09 1 1402
>06/01/09 1 1402
>06/01/09 1 1402
>06/02/09 1 1543
>06/02/09 1 1543
>06/01/09 2 1602
>06/01/09 2 1602
>06/01/09 2 1602
>06/02/09 2 1755
>06/02/09 2 1755
>
>Thank You

First | Prev |
Pages: 1 2
Prev: Unable to Repair Large Data Set
Next: SAS Proc Qlim question