From: Nick on
Hi all,

I have a code that runs MultiStart global optimisation in parallel on my computer's 4 cores. Recently I have at times found myself facing the program aborting with an error. However, sometimes the code runs fine without a problem, and I have yet to see a pattern in when it aborts because of the error. The error message (reproduced below) does not mean much to me, bearing in mind that all 4 labs are based on the same motherboard.

Any advice and help would be much appreciated!

I am running Matlab version 7.10.0.499(R2010a) for 64-bit windows, on 64-bit Windows 7.

The error message I get is shown below. Many thanks for your help!




??? Error using ==> parallel_function at 598
The session that parfor is using has shut down

Error in ==> C:\Program
Files\MATLAB\R2010a\toolbox\globaloptim\globaloptim\private\fmultistart.p>fmultistart at
112


Error in ==> MultiStart>MultiStart.run at 249
[x,fval,exitflag,output,solutions] = ...

Error in ==> M3SpatialHelixReg>M3SpatialHelixRegCore at 190
[parameters,RegErr,~,output,manymins] = run(ms,problem,k);

Error in ==> M3SpatialHelixReg at 119
[parameters,RegErr,output,manymins,~]=M3SpatialHelixRegCore(data1,data2,Coordinates,GeoHelix,mu,stoptol,k2,featpara,scale*2);
??? The client lost connection to lab 4.
This might be due to network problems, or the interactive matlabpool job might have
errored. This is causing: java.io.IOException: An operation on a socket could not be
performed because the system lacked sufficient buffer space or because a queue was full
From: Richard Alcock on
On Tue, 08 Jun 2010 16:03:05 +0000, Nick wrote:

> I am running Matlab version 7.10.0.499(R2010a) for 64-bit windows, on
> 64-bit Windows 7.
>
> ??? Error using ==> parallel_function at 598 The session that parfor is
> using has shut down
>
> Error in ==> C:\Program
> Files\MATLAB\R2010a\toolbox\globaloptim\globaloptim\private
\fmultistart.p>fmultistart
> at 112
>
> Error in ==> MultiStart>MultiStart.run at 249
> [x,fval,exitflag,output,solutions] = ...
>
> Error in ==> M3SpatialHelixReg>M3SpatialHelixRegCore at 190
> [parameters,RegErr,~,output,manymins] = run(ms,problem,k);
>
> Error in ==> M3SpatialHelixReg at 119
> [parameters,RegErr,output,manymins,~]=M3SpatialHelixRegCore
(data1,data2,Coordinates,GeoHelix,mu,stoptol,k2,featpara,scale*2);
> ??? The client lost connection to lab 4. This might be due to network
> problems, or the interactive matlabpool job might have errored. This is
> causing: java.io.IOException: An operation on a socket could not be
> performed because the system lacked sufficient buffer space or because a
> queue was full

Nick,


How "big" is the problem you are trying to solve?
- How many variables and constraints does it have?
- Are you using anonymous or nested function handles for the cost or
constraint functions? Does the workspace associated with these function
handles include any large variables?

If you run a smaller optimization - with fewer variables and constraints,
and make sure any function handles haven't copied large variables they
don't need - does the problem still occur?

Thanks,

--
Richard A
From: Nick on
Richard Alcock <richard.alcock(a)mathworks.co.uk> wrote in message <huo0d8$f0k$2(a)fred.mathworks.com>...

....

> Nick,
>
>
> How "big" is the problem you are trying to solve?
> - How many variables and constraints does it have?
> - Are you using anonymous or nested function handles for the cost or
> constraint functions? Does the workspace associated with these function
> handles include any large variables?
>
> If you run a smaller optimization - with fewer variables and constraints,
> and make sure any function handles haven't copied large variables they
> don't need - does the problem still occur?
>
> Thanks,
>
> --
> Richard A


Hi,

The overall problem is pretty big, but the code in fact involves feature extraction before the optimisation kicks in so the optimisation is only being carried out on a small subset of the data: about 1.1e5 values total at the moment. The optimisation is in 4 variables / dimensions and is only constrained by fixed upper and lower bounds on the 4 variables limiting the search space. The only anonymous function handle is in the objective function in "createOptimProblem" structure to set up the global optimisation - don't see how you could avoid this or why this should be an issue. The objective function does use one of the variables from the full dataset, i.e. approx. 42e6 values, in addition to some smaller variables.

I haven't really tried running the optimisation with a smaller input yet - the feature extraction and essentially only working with a small data subset "should" make the optimisation manageable...

I'll keep experimenting to try and find out what changes cause the code to hit this error and will let you know if I detect some kind of trend!

Thanks!
From: Richard Alcock on
On Wed, 09 Jun 2010 17:01:25 +0000, Nick wrote:

> The overall problem is pretty big, but the code in fact involves feature
> extraction before the optimisation kicks in so the optimisation is only
> being carried out on a small subset of the data: about 1.1e5 values
> total at the moment. The optimisation is in 4 variables / dimensions and
> is only constrained by fixed upper and lower bounds on the 4 variables
> limiting the search space. The only anonymous function handle is in the
> objective function in "createOptimProblem" structure to set up the
> global optimisation - don't see how you could avoid this or why this
> should be an issue. The objective function does use one of the variables
> from the full dataset, i.e. approx. 42e6 values, in addition to some
> smaller variables.

You can have a look at what variables your function handle is carrying
around by using the "functions" function <<http://www.mathworks.com/
access/helpdesk/help/techdoc/ref/functions.html>>

% fh is a function_handle
>> f_info = functions(fh);
% Have a look at what variables are part of the function
>> celldisp(f_info.workspace)

Depending on how you create the function handle you may find you have
more variables that you really need in the function workspace. If this is
the case, try creating the function handle in a subfunction that is
passed only the parts of the data that are needed.

Note - even if this does not solve the "The client lost connection" error
you are seeing, it is still a good idea. Reducing the amount of data that
get copied to the labs should improve the performance of a parfor loop.

Thanks,

--
Richard A
From: Nick on
Richard Alcock <richard.alcock(a)mathworks.co.uk> wrote in message <huqai8$f0k$3(a)fred.mathworks.com>...
> On Wed, 09 Jun 2010 17:01:25 +0000, Nick wrote:
>
> > The overall problem is pretty big, but the code in fact involves feature
> > extraction before the optimisation kicks in so the optimisation is only
> > being carried out on a small subset of the data: about 1.1e5 values
> > total at the moment. The optimisation is in 4 variables / dimensions and
> > is only constrained by fixed upper and lower bounds on the 4 variables
> > limiting the search space. The only anonymous function handle is in the
> > objective function in "createOptimProblem" structure to set up the
> > global optimisation - don't see how you could avoid this or why this
> > should be an issue. The objective function does use one of the variables
> > from the full dataset, i.e. approx. 42e6 values, in addition to some
> > smaller variables.
>
> You can have a look at what variables your function handle is carrying
> around by using the "functions" function <<http://www.mathworks.com/
> access/helpdesk/help/techdoc/ref/functions.html>>
>
> % fh is a function_handle
> >> f_info = functions(fh);
> % Have a look at what variables are part of the function
> >> celldisp(f_info.workspace)
>
> Depending on how you create the function handle you may find you have
> more variables that you really need in the function workspace. If this is
> the case, try creating the function handle in a subfunction that is
> passed only the parts of the data that are needed.
>
> Note - even if this does not solve the "The client lost connection" error
> you are seeing, it is still a good idea. Reducing the amount of data that
> get copied to the labs should improve the performance of a parfor loop.
>
> Thanks,
>
> --
> Richard A


Well, following some more investigation with the having halved the volume of input data I have come to the conclusion that this is a MATLAB bug - time to file a bug report... Basically, the code works fine on some occasions, but fails on others with exactly the same inputs - not good...

Regards,
Nick