From: Alex Poe on
Hello,

I'm considering trying out the parallel toolbox first to solve a simple (though large) system Ax = b. MATLAB comes with an excellent demo showing how solve this system using spmd. I was able to modify the demo code so that the matrix A and the vector b are what I want them to be. I tested the program on a 2 quadcore CPU server using 'matlabpool start local' (so the labs are all on the same machine) - no problem, it solves the system correctly. My question is this: I would like to do the same but on a distributed cluster. My school has one, and I have access to it. In fact I've been using it for quite some time, but my programs were all in C and used ScaLAPACK. I would log in to the cluster's shell, qsub my program, log off, and then wait for an email notification from the cluster that my program has been executed. How does this work with MATLAB? Do I submit the program from within
MATLAB? Do I have to stay logged in (and have MATLAB running) until the job is executed?

Would appreciate any help! Thanks,
--a.
From: Edric M Ellis on
"Alex Poe" <wasteoff.nospam(a)gmail.com> writes:

> I'm considering trying out the parallel toolbox first to solve a simple (though
> large) system Ax = b. MATLAB comes with an excellent demo showing how solve this
> system using spmd. I was able to modify the demo code so that the matrix A and
> the vector b are what I want them to be. I tested the program on a 2 quadcore
> CPU server using 'matlabpool start local' (so the labs are all on the same
> machine) - no problem, it solves the system correctly. My question is this: I
> would like to do the same but on a distributed cluster. My school has one, and I
> have access to it. In fact I've been using it for quite some time, but my
> programs were all in C and used ScaLAPACK. I would log in to the cluster's
> shell, qsub my program, log off, and then wait for an email notification from
> the cluster that my program has been executed. How does this work with
> MATLAB?

To run across multiple nodes, you need "MATLAB Distributed Computing
Server" on the cluster. See:

http://www.mathworks.com/products/distriben/

This allows you to submit a job to a remote cluster - you can then
submit a job to multiple nodes to run your SPMD block across those
nodes. (FYI - we use ScaLAPACK behind the scenes to solve "Ax = b" on
the cluster).

You mention "qsub" - are you using SGE or Torque? We have built-in
integration with Torque; setting up SGE takes a little more work, but
there are instructions telling you exactly what you need to do here.

> Do I submit the program from within MATLAB? Do I have to stay logged in (and
> have MATLAB running) until the job is executed?

If you use MDCS, then you do not need to stay logged in and you can
collect your results later.

Cheers,

Edric.
From: Matt J on
Edric M Ellis <eellis(a)mathworks.com> wrote in message <ytwlj8p2u8h.fsf(a)uk-eellis-deb5-64.mathworks.co.uk>...

>
> To run across multiple nodes, you need "MATLAB Distributed Computing
> Server" on the cluster. See:
>
> http://www.mathworks.com/products/distriben/
>
> This allows you to submit a job to a remote cluster - you can then
> submit a job to multiple nodes to run your SPMD block across those
> nodes.
==================

I have a related question. I was told by a TMW sales rep that the Distributed Computing Server allowed you to use parfor with more than 8 workers, but not necessarily on a remote cluster. It could just be used to scale up the number of local workers usuable by the Parallel Computing Toolbox.

Can anyone confirm that?
From: Matt J on
"Matt J " <mattjacREMOVE(a)THISieee.spam> wrote in message <i368sr$hgs$1(a)fred.mathworks.com>...

>
> I have a related question. I was told by a TMW sales rep that the Distributed Computing Server allowed you to use parfor with more than 8 workers, but not necessarily on a remote cluster. It could just be used to scale up the number of local workers usuable by the Parallel Computing Toolbox.
>
> Can anyone confirm that?
================

For that matter, I was led to believe that with the Distributed Computing Server, you could split a job across any combination of local/remote workers, e.g. you could chain two 8 core machines together and parallelize across all 16 cores. True?
From: Edric M Ellis on
"Matt J " <mattjacREMOVE(a)THISieee.spam> writes:

> "Matt J " <mattjacREMOVE(a)THISieee.spam> wrote in message <i368sr$hgs$1(a)fred.mathworks.com>...
>
>>
>> I have a related question. I was told by a TMW sales rep that the
>> Distributed Computing Server allowed you to use parfor with more than
>> 8 workers, but not necessarily on a remote cluster. It could just be
>> used to scale up the number of local workers usuable by the Parallel
>> Computing Toolbox.
>>
>> Can anyone confirm that?
> ================
>
> For that matter, I was led to believe that with the Distributed
>Computing Server, you could split a job across any combination of
>local/remote workers, e.g. you could chain two 8 core machines together
>and parallelize across all 16 cores. True?

That is true - but note that the MDCS workers will be drawn from a
different "pool" than the "local" workers from PCT. With MDCS, you need
to set up workers (possibly using the "jobmanager"), and you can place
them where you wish. For example, you could run 16 MDCS workers across
two machines, one of which is a desktop machine. You could then run
PARFOR across them all.

Cheers,

Edric.