Using Nearest function [Mathematica]

Prev: Computation of cross correlation between two signals
Next: Lists: Row Vectors vs. Column Vectors. (feels like such a silly

From: Dário Abdulrehman on 7 Jun 2010 08:06

This is my first attempt at writing Mathematica code but I am getting
strange results which are probably due to some bug I cannot detect.

This code creates a test set with 2 classes and size 1000 from a Bivariate
Gaussian distribution then creates 6 training sets with 2 classes and sizes
10^i, i = 1..6.

Then I run the Nearest neighbor algorithm on each train set and test set and
compute the error rate.

However, as you can see from the table at the end, I get error rates that
don't make much sense. I might as well flip a coin instead of running the
algorithm.
Unfortunately I cannot spot the bug in the code.

Thanks.
****************************************************
(* Code *)

Needs["MultivariateStatistics`"];

m = 6;
testSize = 1000;

MN1=MultinormalDistribution[{0.5,0.5},(1 0
0 1

)];
MN2=MultinormalDistribution[{-0.5,-0.5},(1 0
0 1

)];

RandomVector[n_]:=Join[Array[RandomReal[MN1]&,n/2],

Array[RandomReal[MN2]&,n/2]];

testSet = RandomVector[testSize];
trainingSets=Map[Function[x,RandomVector[x]],NestList[10 #&,10,m-1]];

classOf[i_] = If[i<=(testSize/2),1,2];

NN[trainingSet_]:=Module[{nnFunc=Nearest[trainingSet->Automatic]},
N[Fold[Plus,0,MapIndexed[If
[classOf[First[nnFunc[#1]]]!=classOf[First[#2]],1,0]&,testSet]]/testSize]]

Grid[{Prepend[NestList[10
#&,10,m-1],"m"],Prepend[Map[Function[trainingSet,NN[trainingSet]],trainingSets],"error
rate"]},Frame->All]

m 10 100 1000 10000 100000 1000000
error rate 0.5 0.5 0.322 0.484 0.499 0.501

|
Pages: 1
Prev: Computation of cross correlation between two signals
Next: Lists: Row Vectors vs. Column Vectors. (feels like such a silly