loop vs matrix multiplication: speed vs memory [Matlab]

Prev: Two listboxes and a Pushbutton
Next: random number generation

From: Bruno Luong on 6 Apr 2010 16:20

"Juliette Salexa" <juliette.physicist(a)gmail.com> wrote in message <hpg40b$n8d$1(a)fred.mathworks.com>...
>
>
> Thank you Matt J, it's a Feynman integral .. summation over 49^12 paths, each with different amplitude, with no symmetry or cancellations ...

Are you sure you can do it with your supercomputer?

Even if you can compute 10^9 paths in 1 second, it will takes about 6075 years to compute all the terms of paths.

round(49^12/(1e9*3600*24*365))

Bruno

From: Juliette Salexa on 6 Apr 2010 16:39

"Bruno Luong" <b.luong(a)fogale.findmycountry> wrote in message > Are you sure you can do it with your supercomputer?
>
> Even if you can compute 10^9 paths in 1 second, it will takes about 6075 years to compute all the terms of paths.
>
> round(49^12/(1e9*3600*24*365))
>
> Bruno

You're right, sorry i screwed that up big time.
The dimension of the Hilbert space is 7, so I should have said 7^12 or 49^6

a complex double precision matrix with 4^12 elements is ~ 220 GB, and I can cut it down to around 100GB by ignoring the paths with very small amplitude.

It's still a pretty computationally expensive problem, but I think doable and very worthwhile!

From: Matt J on 6 Apr 2010 16:52

"Juliette Salexa" <juliette.physicist(a)gmail.com> wrote in message <hpg40b$n8d$1(a)fred.mathworks.com>...

> Although using matlab's internal matrix-vector multiplication function would require storage of huge arrays, the summation itself would be done extremely fast.
> The disadvantage of my above forloop is that although the memory requirement is negligible, the computation is very slow (because the elements are not generated by RAND, but by a more complicated set of instructions).
================

No, that's not at all clear. Even if there was a RAM chip big enough to hold a matrix of this size and even if MATLAB had access to all of this RAM, the bottleneck in summation speed comes from the need to transfer the matrix data between the RAM and CPU. It's not at all clear that these data transfers would be faster than the computations you would use to generate the M(i,j) elements on the fly. Assessing that would, at minimum require my detail about how the M(i,j) are generated.

Anyway, to answer your question, I guess, people often get a lot faster code, even for much smaller matrices then this, by generating their M(i,j) and doing the summation on the fly. It's done all the time in tomographic imaging for example. However, it does depend, as I said on how much computation the on the fly operations really require and how this compares to bus speed.

From: Juliette Salexa on 6 Apr 2010 19:59

Thank you Matt J,

"Matt J " <mattjacREMOVE(a)THISieee.spam> wrote in message
> No, that's not at all clear. Even if there was a RAM chip big enough to hold a matrix of this size and even if MATLAB had access to all of this RAM, the bottleneck in summation speed comes from the need to transfer the matrix data between the RAM and CPU. It's not at all clear that these data transfers would be faster than the computations you would use to generate the M(i,j) elements on the fly. Assessing that would, at minimum require my detail about how the M(i,j) are generated.
>

Okay, well I still have to work out how the M(i,j)'s will be calculated. I suspect they will be something like: A(3,4)B(1,4)C(2,4)D(2,2)..P(4,3), where those matrices are stored 16x16 matrices, and the indices are chosen between 1:4 and 1:4 based on what i and j are.

I would suspect (just intuitively) that these computations would be more expensive than the data transfers btwn RAM and CPU.

On my laptop with 4GB ram,

For n=117649
It takes almost negligable time to multiply an nxn matrix by and nx1 vector:

tic; A=(sprand(n,n,0.001));toc;
Elapsed time is 82.586749 seconds.

tic; B=(sprand(n,1,1));toc;
Elapsed time is 0.001320 seconds.

tic; A*B; toc;
Elapsed time is 0.016662 seconds.

Attempting to create A=(sprand(n,n,0.002)) failed because I ran out of memory (probably background processes) ... so I think the computation above was pushing the memory limit of my computer

The majority of the time was spent creating the matrix A, and almost zero time doing the summation.

[btw, any idea why it takes almost no time to create the vector B, while it takes more than a minute to create the matrix A ? A only has 118 times the number of non-zero elements as B!!! ]

> Anyway, to answer your question, I guess, people often get a lot faster code, even for much smaller matrices then this, by generating their M(i,j) and doing the summation on the fly. It's done all the time in tomographic imaging for example. However, it does depend, as I said on how much computation the on the fly operations really require and how this compares to bus speed.

I think this forloop method that I described in my last post has another advantage, in that it is easily parallelized (it can be done using parfor ... and then the overall sums on each machine can be added together at the end .... while with matrix vector multiplication, I don't see how it could be parallelized easily (but I'm just guessing here and could be totally wrong)

From: Steven Lord on 7 Apr 2010 10:23

"Juliette Salexa" <juliette.physicist(a)gmail.com> wrote in message
news:hpghsb$in0$1(a)fred.mathworks.com...
> Thank you Matt J,
>
> "Matt J " <mattjacREMOVE(a)THISieee.spam> wrote in message

*snip*

> On my laptop with 4GB ram,
> For n=117649
> It takes almost negligable time to multiply an nxn matrix by and nx1
> vector:
>
> tic; A=(sprand(n,n,0.001));toc;
> Elapsed time is 82.586749 seconds.
>
> tic; B=(sprand(n,1,1));toc;
> Elapsed time is 0.001320 seconds.
>
> tic; A*B; toc;
> Elapsed time is 0.016662 seconds.
>
> Attempting to create A=(sprand(n,n,0.002)) failed because I ran out of
> memory (probably background processes) ... so I think the computation
> above was pushing the memory limit of my computer
>
> The majority of the time was spent creating the matrix A, and almost zero
> time doing the summation.
>
> [btw, any idea why it takes almost no time to create the vector B, while
> it takes more than a minute to create the matrix A ? A only has 118 times
> the number of non-zero elements as B!!! ]

A has n times as many columns as B. Due to the way sparse matrices are
stored, this has a significant effect on how much memory is required, and
that memory allocation is likely what took most of the time in the creation
of A. If your call that creates A was the first call to SPRAND in your
session, that could also have contributed somewhat to the amount of time
needed to create A.

--
Steve Lord
slord(a)mathworks.com
comp.soft-sys.matlab (CSSM) FAQ: http://matlabwiki.mathworks.com/MATLAB_FAQ

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: Two listboxes and a Pushbutton
Next: random number generation