From: Claus Yeh on
Dear SAS users,

I know this is a common question. I have googled but could not find a
good solution.

basically I have a huge dataset ~7GB with 3 variables and 2 billion
observations. I just need to sort it by one variable.

What is the best solution for my task?

thank you so much
claus
From: sas analysis on
On Mar 20, 3:29 pm, Claus Yeh <phoebe.caulfiel...(a)gmail.com> wrote:
> Dear SAS users,
>
> I know this is a common question.  I have googled but could not find a
> good solution.
>
> basically I have a huge dataset ~7GB with 3 variables and 2 billion
> observations.   I just need to sort it by one variable.
>
> What is the best solution for my task?
>
> thank you so much
> claus

you can just use proc sort.

proc sort data=x out=y;
by varible;
run;

hope this is what you are looking for
From: Patrick on
As this is about performance there is not a best approach covering
everything.

What approach to choose will depend on what you have and what you
need. Do you need this dataset sorted for some lookup or for
summarising or to choose top 100 or ....

Proc Sort physically sorts a dataset resulting in a lot of I/O which
is slow.

If you need this sorted dataset only once and you've got enough memory
then a sorted view (proc sql; create view .... as select * order
by ...; quit;) will perform much better.

Depending on your data and requirements also an approach creating an
index (http://www2.sas.com/proceedings/sugi29/123-29.pdf) or a hash
table might be the way to go.

Also important to know would be: How is the data stored? Is it a SAS
file or a DB table - and what DBMS?

HTH
Patrick