From: Proc Me on
If you've got files over 2GB in Windows, the recommendation from SAS is to
use the SGIO=yes option. This instructs SAS to bypass the Windows file cache
and read data directly from disk. It would also be worth looking at the
amount of memory you are committing to the I/O task. This can be set using
the bufno and bufsize system options at runtime (although they have to be
kept within the maxmemsize system option setting specified at invocation.

This paper may help:

http://support.sas.com/resources/papers/IOthruSGIO.pdf

Whilst I was looking this up I also spotted:

http://support.sas.com/resources/papers/proceedings09/334-2009.pdf

Gold dust: I've got a new paper to read, thank you for giving me an excuse
to look!

Proc Me
From: Michael Raithel on
Dear SAS-L-ers,

Dan posted the following clarification:

>
> Yes it is on a Windows Server and the files are on an attached high
> speed storage area network. The usual explanation for slow processing
> that I see is in fact I/O contention with other large jobs that are
> running concurrently. But I have never experienced this magnitude of
> slow down.
>
Dan, I am going to jump onto the dogpile of responses from Joe, Art, and Mark and also opine that it is a network I/O issue. Arf!

As you know, one of the unsung benefits one gets by compressing SAS data sets is fewer overall I/O's transferring data between disk storage and computer memory due to SAS transferring more observations per I/O. So, compression results in fewer overall I/O's when performing a sequential read of the compressed SAS data set than if the same data set were uncompressed. Fewer I/O's result in fewer seconds, minutes, hours, days, spent waiting for a program to run. Since your I/O times went precipitously upward doing simple DATA step tasks, it is unlikely that SAS's uncompressing of incoming observations and compressing outgoing observations is the culprit. It is more in line with the type of server I/O delays that all of us have experienced at one time or another. That is, the trip to and from the server was during a computer rush hour.

You know, your systems folks should be able to run a monitor on the server and been able to tell you that there was heavy I/O activity, or some other marker of poor performance. Hopefully, they are approachable enough and open enough to share that information with a SAS programmer!

Dan, best of luck in all of your SAS endeavors!


I hope that this suggestion proves helpful now, and in the future!

Of course, all of these opinions and insights are my own, and do not reflect those of my organization or my associates. All SAS code and/or methodologies specified in this posting are for illustrative purposes only and no warranty is stated or implied as to their accuracy or applicability. People deciding to use information in this posting do so at their own risk.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Michael A. Raithel
"The man who wrote the book on performance"
E-mail: MichaelRaithel(a)westat.com

Author: Tuning SAS Applications in the MVS Environment

Author: Tuning SAS Applications in the OS/390 and z/OS Environments, Second Edition
http://www.sas.com/apps/pubscat/bookdetails.jsp?catid=1&pc=58172

Author: The Complete Guide to SAS Indexes
http://www.sas.com/apps/pubscat/bookdetails.jsp?catid=1&pc=60409

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Never insult an alligator until after you have crossed the river. - Cordell Hull
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
From: Michael Raithel on
Dear SAS-L-ers,

Proc Me posted the following interesting contribution to this thread:

> If you've got files over 2GB in Windows, the recommendation from SAS is
> to
> use the SGIO=yes option. This instructs SAS to bypass the Windows file
> cache
> and read data directly from disk. It would also be worth looking at the
> amount of memory you are committing to the I/O task. This can be set
> using
> the bufno and bufsize system options at runtime (although they have to
> be
> kept within the maxmemsize system option setting specified at
> invocation.
>
> This paper may help:
>
> http://support.sas.com/resources/papers/IOthruSGIO.pdf
>
> Whilst I was looking this up I also spotted:
>
> http://support.sas.com/resources/papers/proceedings09/334-2009.pdf
>
> Gold dust: I've got a new paper to read, thank you for giving me an
> excuse
> to look!
Proc Me; nice contribution! Not to burst anybody's bubble, but I was pretty excited about the SGIO=yes option when I first heard about it some years ago. So, excited that I asked one of my staff to read the paper (handed him my annotated copy) and perform benchmarks using our real data on SAS 9.1.3 on Windows XP. We got very mediocre results. I was crestfallen, because I really enjoyed both the paper and the concept and wanted to bring something new and beneficial to each and every SAS-hosting desktop at SAS Mecca.

I would be _VERY_ interested if any 'L-ers get meaningful reductions in I/O's on SAS 9.2 (TS2M0 or TS2M2) under Windows XP. Not on the pretty plaything data sets generated with multiple DO loops for proof-of-concept SAS-L postings, but on real-life large-sized data sets. Inquiring minds want to know!

BTW, both Tony Brown and Margaret Crevar have written other stellar papers; I'm a big fan of both of them... but let's just keep that between the two of us.

Proc Me, best of luck in all your SAS endeavors!


I hope that this suggestion proves helpful now, and in the future!

Of course, all of these opinions and insights are my own, and do not reflect those of my organization or my associates. All SAS code and/or methodologies specified in this posting are for illustrative purposes only and no warranty is stated or implied as to their accuracy or applicability. People deciding to use information in this posting do so at their own risk.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Michael A. Raithel
"The man who wrote the book on performance"
E-mail: MichaelRaithel(a)westat.com

Author: Tuning SAS Applications in the MVS Environment

Author: Tuning SAS Applications in the OS/390 and z/OS Environments, Second Edition
http://www.sas.com/apps/pubscat/bookdetails.jsp?catid=1&pc=58172

Author: The Complete Guide to SAS Indexes
http://www.sas.com/apps/pubscat/bookdetails.jsp?catid=1&pc=60409

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A conclusion is simply the place where someone got tired of thinking. - Arthur Block
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
From: NordlDJ on
Thanks to all who have responded to my plea for help. I am trying to do some testing and will definitely look at the papers mentioned. Unfortunately I am up against a deadline and won't be able to do much testing until I deliver my deliverables (and today is [supposed to be] a day off). As a aside, since I was having a problem with the file I created, I decided that I would blow it away and recreate it. The process like before finished in just over 1/2 hour. Now I am going to try to subset it again. (what is it they say about doing the same thing and expecting different results? :-)

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204


> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of
> Proc Me
> Sent: Friday, January 29, 2010 3:06 PM
> To: SAS-L(a)LISTSERV.UGA.EDU
> Subject: Re: puzzling SAS I/O question
>
> If you've got files over 2GB in Windows, the recommendation from SAS is to
> use the SGIO=yes option. This instructs SAS to bypass the Windows file cache
> and read data directly from disk. It would also be worth looking at the
> amount of memory you are committing to the I/O task. This can be set using
> the bufno and bufsize system options at runtime (although they have to be
> kept within the maxmemsize system option setting specified at invocation.
>
> This paper may help:
>
> http://support.sas.com/resources/papers/IOthruSGIO.pdf
>
> Whilst I was looking this up I also spotted:
>
> http://support.sas.com/resources/papers/proceedings09/334-2009.pdf
>
> Gold dust: I've got a new paper to read, thank you for giving me an excuse
> to look!
>
> Proc Me
From: NordlDJ on
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of
> Nordlund, Dan (DSHS/RDA)
> Sent: Friday, January 29, 2010 4:00 PM
> To: SAS-L(a)LISTSERV.UGA.EDU
> Subject: Re: puzzling SAS I/O question
>
> Thanks to all who have responded to my plea for help. I am trying to do some
> testing and will definitely look at the papers mentioned. Unfortunately I am up
> against a deadline and won't be able to do much testing until I deliver my
> deliverables (and today is [supposed to be] a day off). As a aside, since I was
> having a problem with the file I created, I decided that I would blow it away and
> recreate it. The process like before finished in just over 1/2 hour. Now I am going
> to try to subset it again. (what is it they say about doing the same thing and
> expecting different results? :-)
>
> Dan
>
> Daniel J. Nordlund
> Washington State Department of Social and Health Services
> Planning, Performance, and Accountability
> Research and Data Analysis Division
> Olympia, WA 98504-5204
>
>
> > -----Original Message-----
> > From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of
> > Proc Me
> > Sent: Friday, January 29, 2010 3:06 PM
> > To: SAS-L(a)LISTSERV.UGA.EDU
> > Subject: Re: puzzling SAS I/O question
> >
> > If you've got files over 2GB in Windows, the recommendation from SAS is to
> > use the SGIO=yes option. This instructs SAS to bypass the Windows file cache
> > and read data directly from disk. It would also be worth looking at the
> > amount of memory you are committing to the I/O task. This can be set using
> > the bufno and bufsize system options at runtime (although they have to be
> > kept within the maxmemsize system option setting specified at invocation.
> >
> > This paper may help:
> >
> > http://support.sas.com/resources/papers/IOthruSGIO.pdf
> >
> > Whilst I was looking this up I also spotted:
> >
> > http://support.sas.com/resources/papers/proceedings09/334-2009.pdf
> >
> > Gold dust: I've got a new paper to read, thank you for giving me an excuse
> > to look!
> >
> > Proc Me

Well, this round of the saga is coming to a close. I don't know what has changed because I can't monitor performance on the server myself, and I did recreate the large 15 GB file before re-running the program to subset the data. This time around, the subset finished in about 17 minutes (instead of 3 hrs, and sorting the subset took only 1 minute (instead of 47 min.). Clearly resource contention on the server probably had an effect. But I find it hard to believe that performance would degrade that much for that reason alone (47 to 1 on the sort). Especially since my jobs run using a separate work "drive" from everyone else in the office, and separate from the drive for reading and writing the final datasets.

So, I will try to find some time to read the papers referenced in some of the other posts in this thread and try out some of the suggested I/O performance enhancements. Thanks again everyone for all the suggestions.

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204