From: Felipe on
Greetings,

I need help for anyone who have experience about using parfor, I try several times to parallelize the game of life Celular Automata, the normal code looks like:

clear all;
%Size of the matrix
m=100;
n=100;
%Iterations
iter=1000;
X=zeros(m,n);

%Random matrix fill
p1=0.9;
for i=2:m-1
for j=2:n-1
if(rand()>p1); X(i,j)=1;end;
end;
end;


for t=1:iter
X1=zeros(m,n);
for i=2:m-1
for j=2:m-1
neighbors= X(i-1,j-1)+X(i-1,j)+X(i-1,j+1)+X(i,j-1)+X(i,j+1)+...
X(i+1,j-1)+X(i+1,j)+X(i+1,j+1);


if neighbors < 4 && neighbors > 1
v=X(i,j);
if neighbors==3; v=1;end;

else
v=0;
end;
X1(i,j)=v;
end
end

getframe;
image(X*50)

X=X1;
end

In this case I tried to slice de variables by rows (backward,middle,forward) to avoid communication overhead

matlabpool

for t=1:iter
for i=2:m-1

%Temp variables
neighbors=zeros(1,n);
x1=zeros(1,n);

%Slice the 3 positions of neighbors
xback=X(i-1,:);
x=X(i,:);
xforw=X(i+1,:);

%backward
parfor j=2:n-1
jj=j-1;
neighbors(1,j)= neighbors(1,j)+ xback(1,jj)+x(1,jj)+xforw(1,jj);
end
%middle
parfor j=2:n-1
neighbors(1,j)= neighbors(1,j)+ xback(1,j)+ xforw(1,j);
end
%forward
parfor j=2:n-1
jj=j+1;
neighbors(1,j)= neighbors(1,j)+ xback(1,jj)+x(1,jj)+xforw(1,jj);
end

parfor j=2:n-1
if neighbors(1,j) < 4 && neighbors(1,j) > 1
if neighbors(1,j)==3
v=1;
else
v=X(i,j);
end;
else
v=0;
end;
x1(1,j)=v;
end

X1(i,:)=x1;

end

getframe;
image(X*50)

X=X1;

end

matlabpool close;

In the parfor loops the indexing is unique , it accses:
neighbors in ( j )
and
xback, x, xforw in ( j-1 )

parfor j=2:n-1
jj=j-1;
neighbors(1,j)= neighbors(1,j)+ xback(1,jj)+x(1,jj)+xforw(1,jj);
end

But the code runs very very slow.
Appreciate any input on how is the best way for me to modify this code so that it could run efficiently.

Thank you
From: Felipe on
Sorry, after the t for miss X1=zeros(m,n);
> for t=1:iter
> X1=zeros(m,n);
> for i=2:m-1