Prev: PLEASE HELP...........problem in Merge ...
Next: Unable to sort dataset with 6.7 Million records. ERROR: Utility file write failed. Probable disk full condition.
From: a on 22 Mar 2010 07:11 On Mar 22, 4:49 am, Amar Mundankar <amarmundan...(a)gmail.com> wrote: > Hi all, > The input dataset is as follows: > data sample; > input id a b c; > cards; > 2 11 . . > 2 . 44 . > 2 . . 66 > 1 55 . . > 1 . . 99 > 1 . 22 . > ; > I want the output as > id a b c > 2 11 44 66 > 1 55 22 99 > > The input dataset is not in sorted order by id . As my data is huge, i > can NOT sort the Sample dataset by Id. > In the output, i want as many number of records as many are Ids. > Also all the records related to one id are merged together to create a > single record for that particular id. > Please help..... > > Thanks and Regards, > Amar Mundankar. Are the IDs grouped as in you example data? data sample; input id a b c; cards; 2 11 . . 2 . 44 . 2 . . 66 1 55 . . 1 . . 99 1 . 22 . 3 11 . . 3 . 44 . 3 . . 66 4 55 . . 4 . . 99 4 . 22 . ;;;; run; Example1: data IDs; if 0 then set sample; set sample(keep=id); by notsorted id; if first.id; run; data IDs; modify IDs sample; by id; run; proc print; run; Example2: data sample2; if 0 then set sample; do until(last.id); set sample(rename=(a=_a b=_b c=_c)); by notsorted id; a = coalesce(a,_a); b = coalesce(b,_b); c = coalesce(c,_c); end; output; call missing(of _all_); drop _:; run; proc print; run;
From: Patrick on 22 Mar 2010 08:59 Amar If I understand the source of your problem then it's about that you can't sort the data because there is not enough disk space available. Proc Sort sorts a dataset physically AND not in place. This means that there is a moment during execution where your data is duplicated (and you'll need disk space accordingly). What you could do instead is creating an index and then use this index to build your groups (http://www2.sas.com/proceedings/ sugi29/123-29.pdf). Also an SQL view (where you sort the data) could be a possible approach in case you've got enough memory. If the data is grouped by ID (also that the groups are not sorted) then a datastep using the lag() function for ID could work. HTH Patrick
From: PJ on 22 Mar 2010 11:19 proc sql; create table want as select distinct id, sum(a) as a, sum(b) as b, sum(c) as c from sample group by id ;
From: Amar Mundankar on 24 Mar 2010 03:35
On Mar 22, 4:11 pm, a <iebup...(a)gmail.com> wrote: > On Mar 22, 4:49 am, Amar Mundankar <amarmundan...(a)gmail.com> wrote: > > > > > > > Hi all, > > The input dataset is as follows: > > data sample; > > input id a b c; > > cards; > > 2 11 . . > > 2 . 44 . > > 2 . . 66 > > 1 55 . . > > 1 . . 99 > > 1 . 22 . > > ; > > I want the output as > > id a b c > > 2 11 44 66 > > 1 55 22 99 > > > The input dataset is not in sorted order by id . As my data is huge, i > > can NOT sort the Sample dataset by Id. > > In the output, i want as many number of records as many are Ids. > > Also all the records related to one id are merged together to create a > > single record for that particular id. > > Please help..... > > > Thanks and Regards, > > Amar Mundankar. > > Are the IDs grouped as in you example data? > > data sample; > input id a b c; > cards; > 2 11 . . > 2 . 44 . > 2 . . 66 > 1 55 . . > 1 . . 99 > 1 . 22 . > 3 11 . . > 3 . 44 . > 3 . . 66 > 4 55 . . > 4 . . 99 > 4 . 22 . > ;;;; > run; > Example1: > data IDs; > if 0 then set sample; > set sample(keep=id); > by notsorted id; > if first.id; > run; > data IDs; > modify IDs sample; > by id; > run; > proc print; > run; > Example2: > data sample2; > if 0 then set sample; > do until(last.id); > set sample(rename=(a=_a b=_b c=_c)); > by notsorted id; > a = coalesce(a,_a); > b = coalesce(b,_b); > c = coalesce(c,_c); > end; > output; > call missing(of _all_); > drop _:; > run; > proc print; > run;- Hide quoted text - > > - Show quoted text - HI A, The solution in Example 1 suits my purpose. Thanks for that. Also I thank you all for providing solutions of different kinds. Regards, Amar Mundankar. |