From: Tom Abernathy on
A few comments on things you can change.
1) Add NOLIST option to PROC DATASETS so that this doesn't try to tell
you everything about your work library. That will take longer based on
how many datasets are in the library.
2) Add RUN and/or QUIT statements to finish your steps explicitely in
your code rather than having SAS figure it out when it sees the next
DATA or PROC statement.
3) Do your own timing. You can add statements to capture the clock
time before and after an subtract to get differnce.
%let before=%sysfunc(datetime());
..
%put Elapsed time = %sysevalf(%sysfunc(datetime()) - &before) ;


On Apr 29, 4:20 am, adam kortright <adam.kortri...(a)gmail.com> wrote:
> On Apr 29, 4:22 pm, adam kortright <adam.kortri...(a)gmail.com> wrote:
>
>
>
>
>
> > On Apr 29, 1:51 pm, adam kortright <adam.kortri...(a)gmail.com> wrote:
>
> > > On Apr 29, 8:58 am, adam kortright <adam.kortri...(a)gmail.com> wrote:
>
> > > > Hi people,
>
> > > > I was doing some benchmarking of a Hash data step Vs a Proc SQL, both
> > > > summing two variables BY one variable and noticed that the elapsed
> > > > times followed a distinct, and weird, pattern. Running the two pieces
> > > > of code two at a time, ie SQL, SQL, hash, hash, SQL, SQL etc the
> > > > following pattern emerged; The first time the SQL was run it would
> > > > take on average, 9.71 et seconds. However it's second run would, on
> > > > average, take 16.27. Weirdly, the hash step had it round the other
> > > > way. It's first run averaged 7.41 and its second run averaged 3.37.
>
> > > > This pattern was 100% consistent on; a standalone pc with nothing else
> > > > running, a notebook and a networked pc. (All XP machines running
> > > > 9.1)
>
> > > > Can anyone explain why this is happening? Or even better, how to get
> > > > the fastest result first time.
>
> > > > Adam
>
> > > Follow up:
>
> > > Prior to every run of the code I was deleting the output tables using
> > > a PROC SQL; Drop table....; to clean everything out. DROP appears to
> > > be the problem. Changing the DROP to a Proc Datasets ... delete...;
> > > fixed the weird ET pattern and produces the quickest ET consistently.
> > > It appears that the DROP deletes a dataset in the same way that
> > > Windows 'deletes' a file.
>
> > > cheers
>
> > Nope. Ignore the above. The original oddity has returned under PROC
> > DATASETS.
>
> Ok, using a PROC DATASETS...DELETE...; prior to running each piece of
> code and running A & B in pairs, below are the ET figures. As you can
> see, it's weird. This happens on all three pcs that I've tested on.
>
> Seq     Meth A  Meth B
> 1       6.90
> 2       33.95
> 3                      25.85
> 4                       2.90
> 5       7.79
> 6       32.56
> 7                     25.06
> 8                       2.90
> 9       7.96
> 10      34.53
> 11                    24.84
> 12                      2.90
> 13      7.57
> 14      34.06
> 15                    25.20
> 16                     2.89
> 17      7.84
> 18      35.04
> 19                     25.98
> 20                       2.90
>
> Avg ALL 20.82   14.14
> Avg 1st         7.61            25.39
> Avg 2nd 34.03   2.90
>
> fwiw, these are the two bits of code I'm trying to benchmark.
>
> * Method A - SQL *;
> Proc datasets library= work;
>         delete MethodA MethodB;
>
> Proc SQL;
>         Create table MethodA as
>         Select Item, sum(Num) as TotSrvs, sum(BenPd) as Totben
>         From Testdata
>         Group by Item;
>
> *** Method B - Hash ***;
> Proc datasets library= work;
>         delete MethodA MethodB;
>
> Data _Null_ ;
>         Set Testdata end=end;
>
>         If _N_ = 1 Then Do;
>                 Declare hash sum(Hashexp:16);
>                 sum.definekey ('Item');
>                 sum.definedata ('Item','TotBen','TotSrvs');
>                 sum.Definedone ();
>         End;
>
>         If ~sum.Find() Then do;
>                 TotBen + BenPd;
>                 TotSrvs + Num;
>         End;
>         Else Do;
>                 TotBen= BenPd;
>                 TotSrvs= Num;
>         End;
>         sum.replace();
>
> If end then sum.output(Dataset: 'MethodB');
> run;
>
> Any help would be seriously appreciated.
>
> adam- Hide quoted text -
>
> - Show quoted text -

From: adam kortright on
On Apr 29, 10:39 pm, "Richard A. DeVenezia" <rdevene...(a)gmail.com>
wrote:
> On Apr 28, 6:58 pm, adam kortright <adam.kortri...(a)gmail.com> wrote:
>
>
>
>
>
> > Hi people,
>
> > I was doing some benchmarking of a Hash data step Vs a Proc SQL, both
> > summing two variables BY one variable and noticed that the elapsed
> > times followed a distinct, and weird, pattern. Running the two pieces
> > of code two at a time, ie SQL, SQL, hash, hash, SQL, SQL etc the
> > following pattern emerged; The first time the SQL was run it would
> > take on average, 9.71 et seconds. However it's second run would, on
> > average, take 16.27. Weirdly, the hash step had it round the other
> > way. It's first run averaged 7.41 and its second run averaged 3.37.
>
> > This pattern was 100% consistent on; a standalone pc with nothing else
> > running, a notebook and a networked pc. (All XP machines running
> > 9.1)
>
> > Can anyone explain why this is happening? Or even better, how to get
> > the fastest result first time.
>
> > Adam
>
> Varying elapsed times is typically due to operations going on in the I/
> O system.
>
> There are probably a wide number of contributing factors:
> - How many rows & columns in TestData
> - What types of columns
> - Laptop hardware ram/disk/diskcache
> - XP Virtual memory, cache, delayed write settings
> - Virus checkers
> - Corporate services (backups/snoops/etc)
>
> SAS on XP does provide many diagnostics you can observe through
> Windows perfmon. Check outhttp://www2.sas.com/proceedings/sugi30/022-30.pdf
>
> You could also try using the SASFILE statement to force the table into
> memory before using it.
>
> --
> Richard A. DeVeneziahttp://www.devenezia.com

Thanks for your responses Tom & Richard however, as I don't appear to
have made myself clear I'll try again.

There are two distinct and consistently odd pattens in ET figures when
I run the same two pieces of code (sayyy A & B) repeatedly.
The sequence that I'm submitting these jobs in is; A A B B A A B B etc
etc.
The first oddity is that there's always a significant difference in ET
between the first & second time each job is run; ie

1st B et= 25.0s
2nd B et= 2.5s

The second oddity is that while the ET for the second Job B always
(significantly) decreases, the ET for the second Job A always
increases its ET. ie;

1st A et= 9.0s
2nd A et= 33.0s

Note: Any 3rd or more consecutive submit of either Job take approx the
same ET as the 2nd submit. ie;

1st B et= 25.0s
2nd B et= 2.5s
3rd B et= 2.5s
etc

Is this just me?

Notes: 1) This has uniformly occurred on three different pcs; one of
them networked, one of them a standalone with no other apps running &
a 'standard' notebook.
2) All machines are running Win XP3 & SAS 9.1.
3) Both A & B are reading the same dataset.

hth

Adam
From: Tom Abernathy on
Adam -
I understood what you said, but without the actual SAS log (or
logs) I cannot tell if you are talking about multiple steps within a
single SAS session or separate individual SAS processes.
It is possible that what you are seeing as an increase in one step's
elapsed time might actually be counting some of the time from another
step. Not to mention all of the various things that your computer
could be doing outside of SAS. Do your disk drives have automatic
journaling of changes to the disk that might be causing the computer
to steal cycles from SAS?
- Tom

On Apr 29, 5:51 pm, adam kortright <adam.kortri...(a)gmail.com> wrote:
> On Apr 29, 10:39 pm, "Richard A. DeVenezia" <rdevene...(a)gmail.com>
> wrote:
>
>
>
>
>
> > On Apr 28, 6:58 pm, adam kortright <adam.kortri...(a)gmail.com> wrote:
>
> > > Hi people,
>
> > > I was doing some benchmarking of a Hash data step Vs a Proc SQL, both
> > > summing two variables BY one variable and noticed that the elapsed
> > > times followed a distinct, and weird, pattern. Running the two pieces
> > > of code two at a time, ie SQL, SQL, hash, hash, SQL, SQL etc the
> > > following pattern emerged; The first time the SQL was run it would
> > > take on average, 9.71 et seconds. However it's second run would, on
> > > average, take 16.27. Weirdly, the hash step had it round the other
> > > way. It's first run averaged 7.41 and its second run averaged 3.37.
>
> > > This pattern was 100% consistent on; a standalone pc with nothing else
> > > running, a notebook and a networked pc. (All XP machines running
> > > 9.1)
>
> > > Can anyone explain why this is happening? Or even better, how to get
> > > the fastest result first time.
>
> > > Adam
>
> > Varying elapsed times is typically due to operations going on in the I/
> > O system.
>
> > There are probably a wide number of contributing factors:
> > - How many rows & columns in TestData
> > - What types of columns
> > - Laptop hardware ram/disk/diskcache
> > - XP Virtual memory, cache, delayed write settings
> > - Virus checkers
> > - Corporate services (backups/snoops/etc)
>
> > SAS on XP does provide many diagnostics you can observe through
> > Windows perfmon. Check outhttp://www2.sas.com/proceedings/sugi30/022-30..pdf
>
> > You could also try using the SASFILE statement to force the table into
> > memory before using it.
>
> > --
> > Richard A. DeVeneziahttp://www.devenezia.com
>
> Thanks for your responses Tom & Richard however, as I don't appear to
> have made myself clear I'll try again.
>
> There are two distinct and consistently odd pattens in ET figures when
> I run the same two pieces of code (sayyy A & B) repeatedly.
> The sequence that I'm submitting these jobs in is; A A B B A A B B etc
> etc.
> The first oddity is that there's always a significant difference in ET
> between the first & second time each job is run; ie
>
> 1st  B et= 25.0s
> 2nd B et=  2.5s
>
> The second oddity is that while the ET for the second Job B always
> (significantly) decreases, the ET for the second Job A always
> increases its ET. ie;
>
> 1st  A et=  9.0s
> 2nd A et= 33.0s
>
> Note: Any 3rd or more consecutive submit of either Job take approx the
> same ET as the 2nd submit. ie;
>
> 1st  B et= 25.0s
> 2nd B et=   2.5s
> 3rd  B et=   2.5s
> etc
>
> Is this just me?
>
> Notes: 1) This has uniformly occurred on three different pcs; one of
> them networked, one of them a standalone with no other apps running &
> a 'standard' notebook.
> 2) All machines are running Win XP3 & SAS 9.1.
> 3) Both A & B are reading the same dataset.
>
> hth
>
> Adam