From: sheaven on
Hello everyone!

I am new to Mathematica and try get a understanding of its power. I
plan to use Mathematica mainly for financial data analysis (large
lists...).

Currently, I am trying to optimize calculation time for calculations
based on some sample data. I started with with a moving average of
share prices, because Mathematica already has a built in moving
average function for benchmarking.

I know that the built-in functions are always more efficient than any
user built function. Unfortunately, I have to create functions not
built in (e.g. something like "moving variance") in the future.

I have tried numerous ways to calc the moving average as efficiently
as possible. So far, I found that a function based on Span (or
List[[x;;y]]) is most efficient. Below are my test results.
Unfortunately, my UDF is still more than 5x slower than the built in
function.

Do you have any ideas to further speed up the function. I am already
using Compile and Parallelize.


This is what I got so far:


1. Functions for moving average:

1.1. Moving average based on built in function:

(*Function calcs moving average based on built in function for
specified number of days, e.g. 30 days to 250 days in steps of 10*)
movAverageC = Compile[{{inputData, _Real, 1}, {start, _Integer}, {end,
_Integer}, {incr, _Integer}}, Module[{data, size, i},
size = Length[inputData];
Transpose[Join[{inputData}, PadRight[MovingAverage[inputData, #],
size] & /@ Table[x, {x, start, end, incr}]]]
]
]

1.2. User defined function based on Span:
(*UDF for moving average based on Span*)
movAverageOwn2FC = Compile[{{dataInput, _Real, 1}, {days, _Integer},
{length, _Integer}},
N[Mean[dataInput[[1 + # ;; days + #]]]] & /@ Range[0, length - days,
1]
]

(*Function calcs moving average based on UDF "movAverageOwn2FC" for
specified number of days, e.g. 30 days to 250 days in steps of 10*)
movAverageOwn2C = Compile[{{dataInput, _Real, 1}, {start, _Integer},
{end, _Integer}, {incr, _Integer}}, Module[{length},
length = Length[dataInput];
Transpose[Join[{dataInput}, PadRight[movAverageOwn2FC[dataInput, #,
length], length] & /@ Range[start, end, incr]]]
]
]


2. Create sample data:
data = 100 + # & /@ Accumulate[RandomReal[{-1, 1}, {10000}]];


3. Test if functions yield same results:
Test1 = movAverageC[data, 30, 250, 10]; (*Moving average for 30 days
to 250 days in steps of 10*)

Test2 = movAverageOwn2C[data, 30, 250, 10]; (*Moving average for 30
days to 250 days in steps of 10*)

Test1 == Test2
Out = True


4. Performance testing (Singe Core):
AbsoluteTiming[Table[movAverageC[data, 30, 250, 10], {n, 1, 20, 1}];]
(*Repeat function 20x for testing purposes*)
Out = {1.3030000, Null}

AbsoluteTiming[Table[movAverageOwn2C[data, 30, 250, 10], {n, 1, 20,
1}];] (*Repeat function 20x for testing purposes*)
Out = {11.4260000, Null}

=> Result UDF 9x slower


5. Performance testing (multi core):
LaunchKernels[]

Out = {KernelObject[1, "local"], KernelObject[2, "local"]}

DistributeDefinitions[data, movAverageOwn2C, movAverageOwn2FC,
movAverageC]

AbsoluteTiming[Parallelize[Table[movAverageC[data, 30, 250, 10], {n,
1, 20, 1}]];]
Out = {1.3200000, Null}

AbsoluteTiming[Parallelize[Table[movAverageOwn2C[data, 30, 250, 10],
{n, 1, 20, 1}]];]
Out = {6.7170000, Null}

=> Result UDF 5x slower
Very strange that the built in function does not get faster with
Parallelize


I would very much appreciate any input on how to decrease calculation
time based on the user defined function.

Many thanks
Stefan

From: Bill Rowe on
On 4/1/10 at 5:59 AM, sheaven(a)gmx.de (sheaven) wrote:

>I am new to Mathematica and try get a understanding of its power. I
>plan to use Mathematica mainly for financial data analysis (large
>lists...).

>Currently, I am trying to optimize calculation time for calculations
>based on some sample data. I started with with a moving average of
>share prices, because Mathematica already has a built in moving
>average function for benchmarking.

>I know that the built-in functions are always more efficient than
>any user built function. Unfortunately, I have to create functions
>not built in (e.g. something like "moving variance") in the future.

>I have tried numerous ways to calc the moving average as efficiently
>as possible. So far, I found that a function based on Span (or
>List[[x;;y]]) is most efficient. Below are my test results.
>Unfortunately, my UDF is still more than 5x slower than the built in
>function.

>Do you have any ideas to further speed up the function. I am already
>using Compile and Parallelize.

>This is what I got so far:

>1. Functions for moving average:

<function code snipped>

>2. Create sample data: data = 100 + # & /@
>Accumulate[RandomReal[{-1, 1}, {10000}]];

a side point here. The plus function works on lists. That is:

data = 100 + Accumulate[RandomReal[{-1,1}, 10000]];

will produce the same result as your code but be a bit faster.
Note, the difference in speed here will be quite small and is
clearly not the thrust of your message. But I point this out
since such small difference can add up to something significant
in more complex code.

>3. Test if functions yield same results: Test1 = movAverageC[data,
>30, 250, 10]; (*Moving average for 30 days to 250 days in steps of
>10*)

OK. Here is the timing results I get for you compiled code based
on Span

In[1]:= movAverageOwn2FC =
Compile[{{dataInput, _Real,
1}, {days, _Integer}, {length, _Integer}},
N[Mean[dataInput[[1 + # ;; days + #]]]] & /@
Range[0, length - days, 1]];

In[2]:= data = 100 + Accumulate[RandomReal[{-1, 1}, {10000}]];

In[3]:= Timing[Table[movAverageOwn2FC[data, 20, Length(a)data], {100}];]

Out[3]= {1.45855,Null}

Now here is a definition using ListConvolve

In[4]:= newMoveAverage[data_, windowLen_] :=
Module[{ker = Table[1, {windowLen}]/windowLen},
ListConvolve[ker, data]]

In[5]:= Timing[Table[newMoveAverage[data, 20], {100}];]

Out[5]= {0.103379,Null}

So, on my machine using a single core without Compile, using
ListConvolve improves the speed by more than 10X. Using both
parallel processing with both cores should improve this result
for very large data arrays. Note, ListConvolve is so fast, the
overhead of setting up parallel processes will probably degrade
times for small data arrays. I have not tested this to verify my
guess here.

Compile also might improve things somewhat. But this probably
won't be significant. Compile can offer significant improvement
in some code particularly when procedural programming is used.
But compile seldom offers improvement in code with one or two
function calls and no procedural structures such as For. In
fact, there are times when using Compile will actually degrade
the execution speed.

Finally, to demonstrate the code with ListConvolve does the same
as your code:

In[6]:= movAverageOwn2FC[data, 20, Length(a)data] ==
newMoveAverage[data, 20]

Out[6]= True


From: Ray Koopman on
Your compiled movAverageC takes 25% more time than the uncompiled

movAv[data_, start_, end_, incr_] := Transpose(a)PadRight@Join[{data},
Table[MovingAverage[data, r], {r, start, end, incr}]]

under your test conditions.

On Apr 1, 3:59 am, sheaven <shea...(a)gmx.de> wrote:
> Hello everyone!
>
> I am new to Mathematica and try get a understanding of its power. I
> plan to use Mathematica mainly for financial data analysis (large
> lists...).
>
> Currently, I am trying to optimize calculation time for calculations
> based on some sample data. I started with with a moving average of
> share prices, because Mathematica already has a built in moving
> average function for benchmarking.
>
> I know that the built-in functions are always more efficient than any
> user built function. Unfortunately, I have to create functions not
> built in (e.g. something like "moving variance") in the future.
>
> I have tried numerous ways to calc the moving average as efficiently
> as possible. So far, I found that a function based on Span (or
> List[[x;;y]]) is most efficient. Below are my test results.
> Unfortunately, my UDF is still more than 5x slower than the built in
> function.
>
> Do you have any ideas to further speed up the function. I am already
> using Compile and Parallelize.
>
> This is what I got so far:
>
> 1. Functions for moving average:
>
> 1.1. Moving average based on built in function:
>
> (*Function calcs moving average based on built in function for
> specified number of days, e.g. 30 days to 250 days in steps of 10*)
> movAverageC = Compile[{{inputData, _Real, 1}, {start, _Integer}, {end,
> _Integer}, {incr, _Integer}}, Module[{data, size, i},
> size = Length[inputData];
> Transpose[Join[{inputData}, PadRight[MovingAverage[inputData, #],
> size] & /@ Table[x, {x, start, end, incr}]]]
> ]
> ]
>
> 1.2. User defined function based on Span:
> (*UDF for moving average based on Span*)
> movAverageOwn2FC = Compile[{{dataInput, _Real, 1}, {days, _Integer},
> {length, _Integer}},
> N[Mean[dataInput[[1 + # ;; days + #]]]] & /@ Range[0, length - days,
> 1]
> ]
>
> (*Function calcs moving average based on UDF "movAverageOwn2FC" for
> specified number of days, e.g. 30 days to 250 days in steps of 10*)
> movAverageOwn2C = Compile[{{dataInput, _Real, 1}, {start, _Integer},
> {end, _Integer}, {incr, _Integer}}, Module[{length},
> length = Length[dataInput];
> Transpose[Join[{dataInput}, PadRight[movAverageOwn2FC[dataInput, #,
> length], length] & /@ Range[start, end, incr]]]
> ]
> ]
>
> 2. Create sample data:
> data = 100 + # & /@ Accumulate[RandomReal[{-1, 1}, {10000}]];
>
> 3. Test if functions yield same results:
> Test1 = movAverageC[data, 30, 250, 10]; (*Moving average for 30 days
> to 250 days in steps of 10*)
>
> Test2 = movAverageOwn2C[data, 30, 250, 10]; (*Moving average for 30
> days to 250 days in steps of 10*)
>
> Test1 == Test2
> Out = True
>
> 4. Performance testing (Singe Core):
> AbsoluteTiming[Table[movAverageC[data, 30, 250, 10], {n, 1, 20, 1}];]
> (*Repeat function 20x for testing purposes*)
> Out = {1.3030000, Null}
>
> AbsoluteTiming[Table[movAverageOwn2C[data, 30, 250, 10], {n, 1, 20,
> 1}];] (*Repeat function 20x for testing purposes*)
> Out = {11.4260000, Null}
>
> => Result UDF 9x slower
>
> 5. Performance testing (multi core):
> LaunchKernels[]
>
> Out = {KernelObject[1, "local"], KernelObject[2, "local"]}
>
> DistributeDefinitions[data, movAverageOwn2C, movAverageOwn2FC,
> movAverageC]
>
> AbsoluteTiming[Parallelize[Table[movAverageC[data, 30, 250, 10], {n,
> 1, 20, 1}]];]
> Out = {1.3200000, Null}
>
> AbsoluteTiming[Parallelize[Table[movAverageOwn2C[data, 30, 250, 10],
> {n, 1, 20, 1}]];]
> Out = {6.7170000, Null}
>
> => Result UDF 5x slower
> Very strange that the built in function does not get faster with
> Parallelize
>
> I would very much appreciate any input on how to decrease calculation
> time based on the user defined function.
>
> Many thanks
> Stefan

From: Zach Bjornson on
Ray,

Critical statement there is "under your test conditions." I played with
Stefan's problem for quite a while and came up with a few moving average
functions, and tried them all with and without compiling. His function
in particular was only 15% slow compiled/uncompiled on my computer with
his data set. The functions I came up with were usually faster when
compiled, depending on the data set. Also depending on the data set,
some were faster than the built-in MovingAverage function. They were
never faster than the inbuilt function with his data set however, so I
never sent my functions along. Since this came up though, my futzing is
below.

My initial response to Stefen's inquiry was the thought that Compile
would have no effect on MovingAverage, or would just add kernel time
while Mmeca decides to execute it with normal Mathematica code, but I'm
not sure that's true.

-Zach

(*data-set dependencies are illustrated between the top and bottom half
of this*)

$HistoryLength=0 (*to prevent artificially high speeds*)

1.1 Your function
movAverageOwn2FCorig =
Compile[{{dataInput, _Real,
1}, {days, _Integer}, {length, _Integer}},
N[Mean[dataInput[[1 + # ;; days + #]]]] & /@
Range[0, length - days, 1]]

In[165]:=
First(a)Timing[
Do[movAverageOwn2FCorig[Range[1000000], 2, 1000000];, {10}]]/10
Out[165]= 1.7347

1.2 Inbuilt Mathematica function
In[164]:= First(a)Timing[Do[MovingAverage[Range[1000000], 2];, {10}]]/10
Out[164]= 1.6942

1.3 My variation #1
movAverageOwn2FCa =
Compile[{{dataInput, _Real, 1}, {days, _Integer}},
Table[Mean[dataInput[[i ;; i + days - 1]]], {i,
Length(a)dataInput - days + 1}]]

In[166]:=
First(a)Timing[Do[movAverageOwn2FC[Range[1000000], 2];, {10}]]/10
Out[166]= 1.6146

Non-compiled function version gives a time of 4.0311 for this same data set.

1.4 My variation #2
movAverageOwn2Fb =
Compile[{{dataInput, _Real, 1}, {days, _Integer}},
With[{innerdata = Partition[dataInput, days, 1]},
Table[Mean[innerdata[[i]]], {i, Length(a)innerdata}]
]]

In[167]:=
First(a)Timing[Do[movAverageOwn2F3[Range[1000000], 2];, {10}]]/10
Out[167]= 1.6287

Note that this *is* data-set dependent... for example, the same
functions tested on your data symbol give:
In[169]:= First(a)Timing[Do[MovingAverage[data, 2];, {10}]]/10

Out[169]= 0.0015

In[170]:= First(a)Timing[Do[movAverageOwn2Fa[data, 2];, {10}]]/10

Out[170]= 0.0171

In[171]:= First(a)Timing[Do[movAverageOwn2Fb[data, 2];, {10}]]/10

Out[171]= 0.0156

In[173]:=
First(a)Timing[Do[movAverageOwn2FCorig[data, 2, Length(a)data];, {10}]]/10

Out[173]= 0.0171





On 4/4/2010 7:45 AM, Ray Koopman wrote:
> Your compiled movAverageC takes 25% more time than the uncompiled
>
> movAv[data_, start_, end_, incr_] := Transpose(a)PadRight@Join[{data},
> Table[MovingAverage[data, r], {r, start, end, incr}]]
>
> under your test conditions.
>
> On Apr 1, 3:59 am, sheaven<shea...(a)gmx.de> wrote:
>
>> Hello everyone!
>>
>> I am new to Mathematica and try get a understanding of its power. I
>> plan to use Mathematica mainly for financial data analysis (large
>> lists...).
>>
>> Currently, I am trying to optimize calculation time for calculations
>> based on some sample data. I started with with a moving average of
>> share prices, because Mathematica already has a built in moving
>> average function for benchmarking.
>>
>> I know that the built-in functions are always more efficient than any
>> user built function. Unfortunately, I have to create functions not
>> built in (e.g. something like "moving variance") in the future.
>>
>> I have tried numerous ways to calc the moving average as efficiently
>> as possible. So far, I found that a function based on Span (or
>> List[[x;;y]]) is most efficient. Below are my test results.
>> Unfortunately, my UDF is still more than 5x slower than the built in
>> function.
>>
>> Do you have any ideas to further speed up the function. I am already
>> using Compile and Parallelize.
>>
>> This is what I got so far:
>>
>> 1. Functions for moving average:
>>
>> 1.1. Moving average based on built in function:
>>
>> (*Function calcs moving average based on built in function for
>> specified number of days, e.g. 30 days to 250 days in steps of 10*)
>> movAverageC = Compile[{{inputData, _Real, 1}, {start, _Integer}, {end,
>> _Integer}, {incr, _Integer}}, Module[{data, size, i},
>> size = Length[inputData];
>> Transpose[Join[{inputData}, PadRight[MovingAverage[inputData, #],
>> size]& /@ Table[x, {x, start, end, incr}]]]
>> ]
>> ]
>>
>> 1.2. User defined function based on Span:
>> (*UDF for moving average based on Span*)
>> movAverageOwn2FC = Compile[{{dataInput, _Real, 1}, {days, _Integer},
>> {length, _Integer}},
>> N[Mean[dataInput[[1 + # ;; days + #]]]]& /@ Range[0, length - days,
>> 1]
>> ]
>>
>> (*Function calcs moving average based on UDF "movAverageOwn2FC" for
>> specified number of days, e.g. 30 days to 250 days in steps of 10*)
>> movAverageOwn2C = Compile[{{dataInput, _Real, 1}, {start, _Integer},
>> {end, _Integer}, {incr, _Integer}}, Module[{length},
>> length = Length[dataInput];
>> Transpose[Join[{dataInput}, PadRight[movAverageOwn2FC[dataInput, #,
>> length], length]& /@ Range[start, end, incr]]]
>> ]
>> ]
>>
>> 2. Create sample data:
>> data = 100 + #& /@ Accumulate[RandomReal[{-1, 1}, {10000}]];
>>
>> 3. Test if functions yield same results:
>> Test1 = movAverageC[data, 30, 250, 10]; (*Moving average for 30 days
>> to 250 days in steps of 10*)
>>
>> Test2 = movAverageOwn2C[data, 30, 250, 10]; (*Moving average for 30
>> days to 250 days in steps of 10*)
>>
>> Test1 == Test2
>> Out = True
>>
>> 4. Performance testing (Singe Core):
>> AbsoluteTiming[Table[movAverageC[data, 30, 250, 10], {n, 1, 20, 1}];]
>> (*Repeat function 20x for testing purposes*)
>> Out = {1.3030000, Null}
>>
>> AbsoluteTiming[Table[movAverageOwn2C[data, 30, 250, 10], {n, 1, 20,
>> 1}];] (*Repeat function 20x for testing purposes*)
>> Out = {11.4260000, Null}
>>
>> => Result UDF 9x slower
>>
>> 5. Performance testing (multi core):
>> LaunchKernels[]
>>
>> Out = {KernelObject[1, "local"], KernelObject[2, "local"]}
>>
>> DistributeDefinitions[data, movAverageOwn2C, movAverageOwn2FC,
>> movAverageC]
>>
>> AbsoluteTiming[Parallelize[Table[movAverageC[data, 30, 250, 10], {n,
>> 1, 20, 1}]];]
>> Out = {1.3200000, Null}
>>
>> AbsoluteTiming[Parallelize[Table[movAverageOwn2C[data, 30, 250, 10],
>> {n, 1, 20, 1}]];]
>> Out = {6.7170000, Null}
>>
>> => Result UDF 5x slower
>> Very strange that the built in function does not get faster with
>> Parallelize
>>
>> I would very much appreciate any input on how to decrease calculation
>> time based on the user defined function.
>>
>> Many thanks
>> Stefan
>>
>

From: Raffy on
On Apr 4, 4:45 am, Ray Koopman <koop...(a)sfu.ca> wrote:
> Your compiled movAverageC takes 25% more time than the uncompiled
>
> movAv[data_, start_, end_, incr_] := Transpose(a)PadRight@Join[{data},
> Table[MovingAverage[data, r], {r, start, end, incr}]]
>
> under your test conditions.
>
> On Apr 1, 3:59 am, sheaven <shea...(a)gmx.de> wrote:
>
>
>
> > Hello everyone!
>
> > I am new to Mathematica and try get a understanding of its power. I
> > plan to use Mathematica mainly for financial data analysis (large
> > lists...).
>
> > Currently, I am trying to optimize calculation time for calculations
> > based on some sample data. I started with with a moving average of
> > share prices, because Mathematica already has a built in moving
> > average function for benchmarking.
>
> > I know that the built-in functions are always more efficient than any
> > user built function. Unfortunately, I have to create functions not
> > built in (e.g. something like "moving variance") in the future.
>
> > I have tried numerous ways to calc the moving average as efficiently
> > as possible. So far, I found that a function based on Span (or
> > List[[x;;y]]) is most efficient. Below are my test results.
> > Unfortunately, my UDF is still more than 5x slower than the built in
> > function.
>
> > Do you have any ideas to further speed up the function. I am already
> > using Compile and Parallelize.
>
> > This is what I got so far:
>
> > 1. Functions for moving average:
>
> > 1.1. Moving average based on built in function:
>
> > (*Function calcs moving average based on built in function for
> > specified number of days, e.g. 30 days to 250 days in steps of 10*)
> > movAverageC = Compile[{{inputData, _Real, 1}, {start, _Integer}, {end=
,
> > _Integer}, {incr, _Integer}}, Module[{data, size, i},
> > size = Length[inputData];
> > Transpose[Join[{inputData}, PadRight[MovingAverage[inputData, #]=
,
> > size] & /@ Table[x, {x, start, end, incr}]]]
> > ]
> > ]
>
> > 1.2. User defined function based on Span:
> > (*UDF for moving average based on Span*)
> > movAverageOwn2FC = Compile[{{dataInput, _Real, 1}, {days, _Integer},
> > {length, _Integer}},
> > N[Mean[dataInput[[1 + # ;; days + #]]]] & /@ Range[0, length - days=
,
> > 1]
> > ]
>
> > (*Function calcs moving average based on UDF "movAverageOwn2FC" for
> > specified number of days, e.g. 30 days to 250 days in steps of 10*)
> > movAverageOwn2C = Compile[{{dataInput, _Real, 1}, {start, _Integer},
> > {end, _Integer}, {incr, _Integer}}, Module[{length},
> > length = Length[dataInput];
> > Transpose[Join[{dataInput}, PadRight[movAverageOwn2FC[dataInput,=
#,
> > length], length] & /@ Range[start, end, incr]]]
> > ]
> > ]
>
> > 2. Create sample data:
> > data = 100 + # & /@ Accumulate[RandomReal[{-1, 1}, {10000}]];
>
> > 3. Test if functions yield same results:
> > Test1 = movAverageC[data, 30, 250, 10]; (*Moving average for 30 days
> > to 250 days in steps of 10*)
>
> > Test2 = movAverageOwn2C[data, 30, 250, 10]; (*Moving average for 30
> > days to 250 days in steps of 10*)
>
> > Test1 == Test2
> > Out = True
>
> > 4. Performance testing (Singe Core):
> > AbsoluteTiming[Table[movAverageC[data, 30, 250, 10], {n, 1, 20, 1}];]
> > (*Repeat function 20x for testing purposes*)
> > Out = {1.3030000, Null}
>
> > AbsoluteTiming[Table[movAverageOwn2C[data, 30, 250, 10], {n, 1, 20,
> > 1}];] (*Repeat function 20x for testing purposes*)
> > Out = {11.4260000, Null}
>
> > => Result UDF 9x slower
>
> > 5. Performance testing (multi core):
> > LaunchKernels[]
>
> > Out = {KernelObject[1, "local"], KernelObject[2, "local"]}
>
> > DistributeDefinitions[data, movAverageOwn2C, movAverageOwn2FC,
> > movAverageC]
>
> > AbsoluteTiming[Parallelize[Table[movAverageC[data, 30, 250, 10], {n,
> > 1, 20, 1}]];]
> > Out = {1.3200000, Null}
>
> > AbsoluteTiming[Parallelize[Table[movAverageOwn2C[data, 30, 250, 10],
> > {n, 1, 20, 1}]];]
> > Out = {6.7170000, Null}
>
> > => Result UDF 5x slower
> > Very strange that the built in function does not get faster with
> > Parallelize
>
> > I would very much appreciate any input on how to decrease calculation
> > time based on the user defined function.
>
> > Many thanks
> > Stefan

ma = Function[{vData, vRange}, With[
{vAcc =
Prepend[Accumulate(a)Developer`ToPackedArray[vData, Real], 0.]},
Transpose@
Developer`ToPackedArray[
Prepend[Table[
PadRight[(Drop[vAcc, n] - Drop[vAcc, -n])/n, Length[vData],
0.], {n, vRange}], vData], Real]
]];

ma[data, Range[30, 250, 10]]

This is a 4-5x speed up over movAverageC.

mv = Function[{vData, vRange}, With[
{v1 = Prepend[Accumulate[vData], 0.],
v2 = Prepend[Accumulate[vData^2], 0.]},
Transpose@
Developer`ToPackedArray[
Prepend[Table[
PadRight[(Drop[v2, n] - Drop[v2, -n])/
n - ((Drop[v1, n] - Drop[v1, -n])/n)^2, Length[vData],
0.], {n, vRange}], vData], Real]
]];

This would be a fast moving variance.