From: Javier Meseguer de Paz on 29 Jul 2010 04:59 If you just want to know how similar the two sequences of numbers are, I'd go for the Euclidean distance between the two just as Ilmari said. If you are also interested in the correlations of the data, you might also have a look at the Mahalanobis distance (http://en.wikipedia.org/ wiki/Mahalanobis_distance), which is used a lot for classification problems in artifical intelligence, for example. On Jul 28, 11:05 pm, Ilmari Karonen <usen...(a)vyznev.invalid> wrote: > On 2010-07-28, skillzero <skillz...(a)gmail.com> wrote: > > > On Jul 28, 8:06 am, Ilmari Karonen <usen...(a)vyznev.invalid> wrote: > > >> Are you perhaps overcomplicating the problem? If the example you gave > >> is indeed representative of your problem (i.e. matching data sets > >> always have the same number elements, order matters), wouldn't > >> something as simple as the (Euclidean) distance between the data sets > >> as vectors work as a measure of similarity? > > > The data sets I'm comparing have the same number of elements and > > should be in the same order. I initially tried calculating a sum of > > the differences between each element where the lowest result won. That > > seems similar to a Euclidean distance, if I understand it correctly. > > It didn't seem to work very well, but it may have just been that I was > > doing it incorrectly (or maybe I'm misunderstanding Euclidean > > distance). > > That's the L^1 distance. For the Euclidean (a.k.a. L^2) distance, you > need to sum the squares of the differences and take the square root of > the result. (Although, if you're just interested in comparing > distances to see which is smallest, you can leave out the square root > step.) There are other possible distance norms you could try, but the > Euclidean distance is optimal if the noise in your data values is > additive, independent and normally distributed (with zero mean and > constant variance). Since many types of real-world noise are at least > approximately like that, it's a good first choice to try. > > (ps. Yeah, I misspelled "Euclidean" in my previous post; I've fixed it > above.) > > -- > Ilmari Karonen > To reply by e-mail, please replace ".invalid" with ".net" in address. |