From: Claus Yeh on 20 Mar 2010 16:29 Dear SAS users, I know this is a common question. I have googled but could not find a good solution. basically I have a huge dataset ~7GB with 3 variables and 2 billion observations. I just need to sort it by one variable. What is the best solution for my task? thank you so much claus
From: sas analysis on 20 Mar 2010 17:20 On Mar 20, 3:29 pm, Claus Yeh <phoebe.caulfiel...(a)gmail.com> wrote: > Dear SAS users, > > I know this is a common question. I have googled but could not find a > good solution. > > basically I have a huge dataset ~7GB with 3 variables and 2 billion > observations. I just need to sort it by one variable. > > What is the best solution for my task? > > thank you so much > claus you can just use proc sort. proc sort data=x out=y; by varible; run; hope this is what you are looking for
From: Patrick on 20 Mar 2010 20:09 As this is about performance there is not a best approach covering everything. What approach to choose will depend on what you have and what you need. Do you need this dataset sorted for some lookup or for summarising or to choose top 100 or .... Proc Sort physically sorts a dataset resulting in a lot of I/O which is slow. If you need this sorted dataset only once and you've got enough memory then a sorted view (proc sql; create view .... as select * order by ...; quit;) will perform much better. Depending on your data and requirements also an approach creating an index (http://www2.sas.com/proceedings/sugi29/123-29.pdf) or a hash table might be the way to go. Also important to know would be: How is the data stored? Is it a SAS file or a DB table - and what DBMS? HTH Patrick
|
Pages: 1 Prev: putting data in the right format Next: getting unique ID observations |