From: a on
On Mar 22, 4:49 am, Amar Mundankar <amarmundan...(a)gmail.com> wrote:
> Hi all,
> The input dataset is as follows:
> data sample;
>         input id a b c;
>         cards;
> 2 11 . .
> 2 . 44 .
> 2 . .  66
> 1 55 .  .
> 1 . . 99
> 1 . 22 .
> ;
> I want the output as
> id a b c
> 2 11 44 66
> 1 55 22 99
>
> The input dataset is not in sorted order by id . As my data is huge, i
> can NOT sort the Sample dataset by Id.
> In the output, i want as many number of records as many are Ids.
> Also all the records related to one id are merged together to create a
> single record for that particular id.
> Please help.....
>
> Thanks and Regards,
> Amar Mundankar.

Are the IDs grouped as in you example data?

data sample;
input id a b c;
cards;
2 11 . .
2 . 44 .
2 . . 66
1 55 . .
1 . . 99
1 . 22 .
3 11 . .
3 . 44 .
3 . . 66
4 55 . .
4 . . 99
4 . 22 .
;;;;
run;
Example1:
data IDs;
if 0 then set sample;
set sample(keep=id);
by notsorted id;
if first.id;
run;
data IDs;
modify IDs sample;
by id;
run;
proc print;
run;
Example2:
data sample2;
if 0 then set sample;
do until(last.id);
set sample(rename=(a=_a b=_b c=_c));
by notsorted id;
a = coalesce(a,_a);
b = coalesce(b,_b);
c = coalesce(c,_c);
end;
output;
call missing(of _all_);
drop _:;
run;
proc print;
run;
From: Patrick on
Amar

If I understand the source of your problem then it's about that you
can't sort the data because there is not enough disk space available.

Proc Sort sorts a dataset physically AND not in place. This means that
there is a moment during execution where your data is duplicated (and
you'll need disk space accordingly).

What you could do instead is creating an index and then use this index
to build your groups (http://www2.sas.com/proceedings/
sugi29/123-29.pdf). Also an SQL view (where you sort the data) could
be a possible approach in case you've got enough memory.

If the data is grouped by ID (also that the groups are not sorted)
then a datastep using the lag() function for ID could work.

HTH
Patrick
From: PJ on
proc sql;
create table want as
select distinct id, sum(a) as a, sum(b) as b, sum(c) as c
from sample
group by id
;
From: Amar Mundankar on
On Mar 22, 4:11 pm, a <iebup...(a)gmail.com> wrote:
> On Mar 22, 4:49 am, Amar Mundankar <amarmundan...(a)gmail.com> wrote:
>
>
>
>
>
> > Hi all,
> > The input dataset is as follows:
> > data sample;
> >         input id a b c;
> >         cards;
> > 2 11 . .
> > 2 . 44 .
> > 2 . .  66
> > 1 55 .  .
> > 1 . . 99
> > 1 . 22 .
> > ;
> > I want the output as
> > id a b c
> > 2 11 44 66
> > 1 55 22 99
>
> > The input dataset is not in sorted order by id . As my data is huge, i
> > can NOT sort the Sample dataset by Id.
> > In the output, i want as many number of records as many are Ids.
> > Also all the records related to one id are merged together to create a
> > single record for that particular id.
> > Please help.....
>
> > Thanks and Regards,
> > Amar Mundankar.
>
> Are the IDs grouped as in you example data?
>
> data sample;
>    input id a b c;
>    cards;
> 2 11 . .
> 2 . 44 .
> 2 . .  66
> 1 55 .  .
> 1 . . 99
> 1 . 22 .
> 3 11 . .
> 3 . 44 .
> 3 . .  66
> 4 55 .  .
> 4 . . 99
> 4 . 22 .
> ;;;;
>    run;
> Example1:
> data IDs;
>    if 0 then set sample;
>    set sample(keep=id);
>    by notsorted id;
>    if first.id;
>    run;
> data IDs;
>    modify IDs sample;
>    by id;
>    run;
> proc print;
>    run;
> Example2:
> data sample2;
>    if 0 then set sample;
>    do until(last.id);
>       set sample(rename=(a=_a b=_b c=_c));
>       by notsorted id;
>       a = coalesce(a,_a);
>       b = coalesce(b,_b);
>       c = coalesce(c,_c);
>       end;
>    output;
>    call missing(of _all_);
>    drop _:;
>    run;
> proc print;
>    run;- Hide quoted text -
>
> - Show quoted text -

HI A,
The solution in Example 1 suits my purpose.
Thanks for that.

Also I thank you all for providing solutions of different kinds.

Regards,
Amar Mundankar.