From: Amar Mundankar on
Hi all,
Below is the code, I am trying to run.

proc sort data=ftf_out.want1 out=ftf_out.want(drop=n _name_);
by PLTK_PLAN_ID;
run;

Dataset want1 has 6.7 Million records.
I am getting error as
ERROR: Utility file write failed. Probable disk full condition.
I tried few options like changing the location of UTILLOC system
option and set this option to a parmanent library having adequate
space.
What can be the problem ??
Please help. I have to run other steps on Sorted dataset. I am not
able to start other processing.
Any help will be appreciated.

Thanks in Advance.

Regards,
Amar Mundankar.
From: Lou on

"Amar Mundankar" <amarmundankar(a)gmail.com> wrote in message
news:98490d9b-62a7-4d1e-bbb8-1e5b2d460158(a)u15g2000prd.googlegroups.com...
> Hi all,
> Below is the code, I am trying to run.
>
> proc sort data=ftf_out.want1 out=ftf_out.want(drop=n _name_);
> by PLTK_PLAN_ID;
> run;
>
> Dataset want1 has 6.7 Million records.
> I am getting error as
> ERROR: Utility file write failed. Probable disk full condition.
> I tried few options like changing the location of UTILLOC system
> option and set this option to a parmanent library having adequate
> space.
> What can be the problem ??
> Please help. I have to run other steps on Sorted dataset. I am not
> able to start other processing.
> Any help will be appreciated.

PROC SORT starts with the original, unsorted dataset and builds the
finished, sorted dataset, using some sort work files along the way. At a
minimum, it'll need disk space amounting to double that of the original
dataset, however briefly, and some more besides for sort work files.

If you don't have enough disk space to accommodate all that, there are a
couple of options to try.

The first, and easiest, is to use the TAGSORT option. During a regular
sort, all the non-key variables are dragged along for the ride. TAGSORT
sorts just the by-variables, along with the observation number in the
original dataset that particular combination of by-values came from. Once
the by-variables are sorted, it uses the observation number to randomly
access the original dataset and build the final output dataset. Because a
tagsort doesn't carry along all the other variables on each observation, it
can use considerably less disk space to sort a file. The trade-off is that
it takes more time than a plain old sort, but in cases where you can't do a
plain old sort, it may be worth it. The syntax is

PROC SORT DATA = A OUT = B TAGSORT;
BY by-variable;
QUIT;

If you still don't have enough disk space, you can try breaking it down. I
can't tell if your by-variable is composed of character or numeric values
but assuming the first character is alphabetic, you pull out all
observations where the first character is 'A', sort that, then pull out
everything where the first character is 'B', sort that, append the output of
the second sort to the output of the first sort, delete the output of the
second sort, and go on with all the observations where the first character
is 'C', and so on. This will take considerably more time to run, and more
time to code, but if all else fails, this will eventually get the job done,
assuming your disk isn't full-up before you start. The code would look
something like

PROC SORT DATA = A (WHERE = (by-variable =: 'A')) OUT = sorted;
BY by-variable;
PROC SORT DATA = A (WHERE = (by-variable =: 'B')) OUT = temp;
BY by-variable;
PROC APPEND BASE = SORTED DATA = TEMP;
PROC SQL; DROP TABLE TEMP;
PROC SORT DATA = A (WHERE = (by-variable =: 'C')) OUT = temp;
etc.

Cutting down on the code you have to write by putting it all into a macro is
left as an exercise for the reader, though with copy and paste it's possibly
not a real big deal to just write the code explicitly in the program,
depending on how many groups you have to break the task into.