Ideas for fast distance computatino code? [Matlab]

Prev: remove MATLAB Icon on GUI
Next: Real-time Workshop and TI DM642 EVM Usage Problem

From: Joel on 17 Feb 2010 22:36

I wantto make this calculation as fast as possible for large data sets:
I have a set of points in dimension N.
I generate a new point and I want to find the which of original M points it is closest to.
Without resorting to quad-trees, bounding boxes or other specialized data structures, what do you think is the fastest (vectrozed) way to write this code?

Here is what I have now:
%example data
N=4; NumPts = 100000;
Pts = rand(NumPts,N);
NewPt = rand(1,N);
% Compute closest point
tic
dist = zeros(NumPts,1)
for i = 1:N
dist = dist + (Pts(:,i)-NewPt(1,i)).^2
end
[dmin,i] = min(dist);
closestPt = Pts(i,:)
toc

I have tried a few approaches, but so far this is my best. Any ideas on how to improve? Are there any built in function I can take Advantage of?

Right now I have examples with a million points in 8 dimensions but I can't get much beyone because this calculation is the bottleneck..

From: Jan Simon on 18 Feb 2010 05:40

Dear Joel!

> I generate a new point and I want to find the which of original M points it is closest to.

> %example data
> N=4; NumPts = 100000;
> Pts = rand(NumPts,N);
> NewPt = rand(1,N);
> % Compute closest point
> tic
> dist = zeros(NumPts,1)
> for i = 1:N
> dist = dist + (Pts(:,i)-NewPt(1,i)).^2
> end
> [dmin,i] = min(dist);
> closestPt = Pts(i,:)
> toc

Did you try this instead of the loop:
dist = sum(bsxfun(@minus, Pts, NewPt) .^2, 2);

Good luck, Jan

From: Joel on 18 Feb 2010 08:59

Jan, Thanks so much for the tip. Its very slick, never saw that function before.
However that actually takes about twice as long on my machine.

I shoudl add that, I thought a bit about datatype. I am willing to use single to go faster.

From: Matt J on 18 Feb 2010 09:18

"Joel" <esposito(a)usna.edu> wrote in message <hljh3s$shr$1(a)fred.mathworks.com>...
> Jan, Thanks so much for the tip. Its very slick, never saw that function before.
> However that actually takes about twice as long on my machine.
>
> I shoudl add that, I thought a bit about datatype. I am willing to use single to go faster.
=======================

That's probably because you're running with small N. For larger N, bsxfun will be faster. If large N is not relevant to you, I too have found that a for-loop is the fastest solution.

From: Joel on 18 Feb 2010 09:30

I used 1 million points and N=20 in my testing.

Any other possible ideas?

| Next | Last
Pages: 1 2
Prev: remove MATLAB Icon on GUI
Next: Real-time Workshop and TI DM642 EVM Usage Problem