From: Pr B on
i have the following code that i am trying to speed up:

r_matrix = zeros(length(r));

for i = 1:length(r)

i

for j = i:length(r)

r_matrix(i,j) = 1 - (length(intersect(r{i},r{j})) / length(union(r{i},r{j})));

end

end

first i load 'r' from a .mat file. 'r' is a cell array of length 1162 and each cell of 'r' contains strings like 'ABCD1'. each cell of 'r' is quite big, with the max number of strings in a cell being ~9000. i want to compute a score that is basically 1 minus the intersection of all pairs of cells in 'r' divided by the union of all pairs of cells of 'r': 1 - (length(intersect(r{i},r{j})) / length(union(r{i},r{j})));. i store this score in a preallocated matrix called 'r' matrix, where each cell contains the corresponding pairwise score.

the problem is, this code takes very very long to run. about 9 hours, in fact. given that all i'm doing is taking an intersection and union of cells, why is my code running so slowly? is there any way i can make it run faster?
From: Jan Simon on
Dear Pr B,

> r_matrix(i,j) = 1 - (length(intersect(r{i},r{j})) / length(union(r{i},r{j})));

INTERSECT and UNION sort the arrays in each call, which eats up an enourmous chunk of time. So I'd suggest to sort the arrays once and use modified copies of INTERSECT and UNION, which omit the sorting.

As far as I can see, LENGTH(INTERSECT(A, B)) equals LENGTH(INTERSECT(B, A)), so you have to calculate the upper triangle of r_matrix only.

Good luck, Jan