From: adam kortright on 28 Apr 2010 18:58 Hi people, I was doing some benchmarking of a Hash data step Vs a Proc SQL, both summing two variables BY one variable and noticed that the elapsed times followed a distinct, and weird, pattern. Running the two pieces of code two at a time, ie SQL, SQL, hash, hash, SQL, SQL etc the following pattern emerged; The first time the SQL was run it would take on average, 9.71 et seconds. However it's second run would, on average, take 16.27. Weirdly, the hash step had it round the other way. It's first run averaged 7.41 and its second run averaged 3.37. This pattern was 100% consistent on; a standalone pc with nothing else running, a notebook and a networked pc. (All XP machines running 9.1) Can anyone explain why this is happening? Or even better, how to get the fastest result first time. Adam
From: adam kortright on 28 Apr 2010 23:51 On Apr 29, 8:58 am, adam kortright <adam.kortri...(a)gmail.com> wrote: > Hi people, > > I was doing some benchmarking of a Hash data step Vs a Proc SQL, both > summing two variables BY one variable and noticed that the elapsed > times followed a distinct, and weird, pattern. Running the two pieces > of code two at a time, ie SQL, SQL, hash, hash, SQL, SQL etc the > following pattern emerged; The first time the SQL was run it would > take on average, 9.71 et seconds. However it's second run would, on > average, take 16.27. Weirdly, the hash step had it round the other > way. It's first run averaged 7.41 and its second run averaged 3.37. > > This pattern was 100% consistent on; a standalone pc with nothing else > running, a notebook and a networked pc. (All XP machines running > 9.1) > > Can anyone explain why this is happening? Or even better, how to get > the fastest result first time. > > Adam Follow up: Prior to every run of the code I was deleting the output tables using a PROC SQL; Drop table....; to clean everything out. DROP appears to be the problem. Changing the DROP to a Proc Datasets ... delete...; fixed the weird ET pattern and produces the quickest ET consistently. It appears that the DROP deletes a dataset in the same way that Windows 'deletes' a file. cheers
From: adam kortright on 29 Apr 2010 02:22 On Apr 29, 1:51 pm, adam kortright <adam.kortri...(a)gmail.com> wrote: > On Apr 29, 8:58 am, adam kortright <adam.kortri...(a)gmail.com> wrote: > > > > > > > Hi people, > > > I was doing some benchmarking of a Hash data step Vs a Proc SQL, both > > summing two variables BY one variable and noticed that the elapsed > > times followed a distinct, and weird, pattern. Running the two pieces > > of code two at a time, ie SQL, SQL, hash, hash, SQL, SQL etc the > > following pattern emerged; The first time the SQL was run it would > > take on average, 9.71 et seconds. However it's second run would, on > > average, take 16.27. Weirdly, the hash step had it round the other > > way. It's first run averaged 7.41 and its second run averaged 3.37. > > > This pattern was 100% consistent on; a standalone pc with nothing else > > running, a notebook and a networked pc. (All XP machines running > > 9.1) > > > Can anyone explain why this is happening? Or even better, how to get > > the fastest result first time. > > > Adam > > Follow up: > > Prior to every run of the code I was deleting the output tables using > a PROC SQL; Drop table....; to clean everything out. DROP appears to > be the problem. Changing the DROP to a Proc Datasets ... delete...; > fixed the weird ET pattern and produces the quickest ET consistently. > It appears that the DROP deletes a dataset in the same way that > Windows 'deletes' a file. > > cheers Nope. Ignore the above. The original oddity has returned under PROC DATASETS.
From: adam kortright on 29 Apr 2010 04:20 On Apr 29, 4:22 pm, adam kortright <adam.kortri...(a)gmail.com> wrote: > On Apr 29, 1:51 pm, adam kortright <adam.kortri...(a)gmail.com> wrote: > > > > > > > On Apr 29, 8:58 am, adam kortright <adam.kortri...(a)gmail.com> wrote: > > > > Hi people, > > > > I was doing some benchmarking of a Hash data step Vs a Proc SQL, both > > > summing two variables BY one variable and noticed that the elapsed > > > times followed a distinct, and weird, pattern. Running the two pieces > > > of code two at a time, ie SQL, SQL, hash, hash, SQL, SQL etc the > > > following pattern emerged; The first time the SQL was run it would > > > take on average, 9.71 et seconds. However it's second run would, on > > > average, take 16.27. Weirdly, the hash step had it round the other > > > way. It's first run averaged 7.41 and its second run averaged 3.37. > > > > This pattern was 100% consistent on; a standalone pc with nothing else > > > running, a notebook and a networked pc. (All XP machines running > > > 9.1) > > > > Can anyone explain why this is happening? Or even better, how to get > > > the fastest result first time. > > > > Adam > > > Follow up: > > > Prior to every run of the code I was deleting the output tables using > > a PROC SQL; Drop table....; to clean everything out. DROP appears to > > be the problem. Changing the DROP to a Proc Datasets ... delete...; > > fixed the weird ET pattern and produces the quickest ET consistently. > > It appears that the DROP deletes a dataset in the same way that > > Windows 'deletes' a file. > > > cheers > > Nope. Ignore the above. The original oddity has returned under PROC > DATASETS. Ok, using a PROC DATASETS...DELETE...; prior to running each piece of code and running A & B in pairs, below are the ET figures. As you can see, it's weird. This happens on all three pcs that I've tested on. Seq Meth A Meth B 1 6.90 2 33.95 3 25.85 4 2.90 5 7.79 6 32.56 7 25.06 8 2.90 9 7.96 10 34.53 11 24.84 12 2.90 13 7.57 14 34.06 15 25.20 16 2.89 17 7.84 18 35.04 19 25.98 20 2.90 Avg ALL 20.82 14.14 Avg 1st 7.61 25.39 Avg 2nd 34.03 2.90 fwiw, these are the two bits of code I'm trying to benchmark. * Method A - SQL *; Proc datasets library= work; delete MethodA MethodB; Proc SQL; Create table MethodA as Select Item, sum(Num) as TotSrvs, sum(BenPd) as Totben From Testdata Group by Item; *** Method B - Hash ***; Proc datasets library= work; delete MethodA MethodB; Data _Null_ ; Set Testdata end=end; If _N_ = 1 Then Do; Declare hash sum(Hashexp:16); sum.definekey ('Item'); sum.definedata ('Item','TotBen','TotSrvs'); sum.Definedone (); End; If ~sum.Find() Then do; TotBen + BenPd; TotSrvs + Num; End; Else Do; TotBen= BenPd; TotSrvs= Num; End; sum.replace(); If end then sum.output(Dataset: 'MethodB'); run; Any help would be seriously appreciated. adam
From: Richard A. DeVenezia on 29 Apr 2010 08:39 On Apr 28, 6:58 pm, adam kortright <adam.kortri...(a)gmail.com> wrote: > Hi people, > > I was doing some benchmarking of a Hash data step Vs a Proc SQL, both > summing two variables BY one variable and noticed that the elapsed > times followed a distinct, and weird, pattern. Running the two pieces > of code two at a time, ie SQL, SQL, hash, hash, SQL, SQL etc the > following pattern emerged; The first time the SQL was run it would > take on average, 9.71 et seconds. However it's second run would, on > average, take 16.27. Weirdly, the hash step had it round the other > way. It's first run averaged 7.41 and its second run averaged 3.37. > > This pattern was 100% consistent on; a standalone pc with nothing else > running, a notebook and a networked pc. (All XP machines running > 9.1) > > Can anyone explain why this is happening? Or even better, how to get > the fastest result first time. > > Adam Varying elapsed times is typically due to operations going on in the I/ O system. There are probably a wide number of contributing factors: - How many rows & columns in TestData - What types of columns - Laptop hardware ram/disk/diskcache - XP Virtual memory, cache, delayed write settings - Virus checkers - Corporate services (backups/snoops/etc) SAS on XP does provide many diagnostics you can observe through Windows perfmon. Check out http://www2.sas.com/proceedings/sugi30/022-30.pdf You could also try using the SASFILE statement to force the table into memory before using it. -- Richard A. DeVenezia http://www.devenezia.com
|
Next
|
Last
Pages: 1 2 Prev: Solution Architect -- Brooklyn NY -- 12 Months Next: tukey's adjustment |