From: Aaron Bramson on 19 Jul 2010 02:10 Hello Everybody, I would like to perform a 2-sample k-s test. I've seen some posts on the archive about the one-sample goodness-of-fit version of the Kolgomorov-Smirnov test, but I'm interested in the 2-sample version. Here's a description of the method: http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test#Two-sample_Kolmogorov.E2.80.93Smirnov_test I think all the necessary components are available; e.g. Accumulate[BinCounts[_list_]] to get the ecdf of both datasets, abs, max of a list, etc. But the data management is a bit above my current skill level. Also, since all other software packages seem to include this test capability, I would be really surprised if there wasn't a package somewhere that included it by now, but I've searched a lot and can't find it. Can anybody help me locate this this? Alternatively, would anybody like to work with me to build this in case it can't be found? Thanks in Advance, Aaron
From: Ray Koopman on 20 Jul 2010 03:41 On Jul 18, 11:10 pm, Aaron Bramson <aaronbram...(a)gmail.com> wrote: > Hello Everybody, > > I would like to perform a 2-sample k-s test. I've seen some posts on the > archive about the one-sample goodness-of-fit version of the > Kolgomorov-Smirnov test, but I'm interested in the 2-sample version. > > Here's a description of the method: > http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test#Two-sample_Kolmogorov.E2.80.93Smirnov_test > > I think all the necessary components are available; e.g. > Accumulate[BinCounts[_list_]] to get the ecdf of both datasets, abs, max of > a list, etc. But the data management is a bit above my current skill > level. Also, since all other software packages seem to include this test > capability, I would be really surprised if there > wasn't a package somewhere that included it by now, but I've searched a lot > and can't find it. Can anybody help me locate this this? > > Alternatively, would anybody like to work with me to build this in case it > can't be found? > > Thanks in Advance, > Aaron This is based on code by Thomas Waterhouse, taken from https://stat.ethz.ch/pipermail/r-devel/2009-July/054106.html The arguments y1 and y2 are lists of data. It returns {x,p}, where x is the usual K-S test statistic n1*n2*Max|cdf1-cdf2|, and p is the corresponding p-value, conditional on the distribution of ties in the pooled data. ks2[y1_,y2_] := Block[{n1 = Length(a)y1, n2 = Length(a)y2, pool = Sort(a)Join[y1,y2], x,n,u}, {x = Max(a)Abs[n2*Tr(a)UnitStep[y1-#]-n1*Tr(a)UnitStep[y2-#]&/@Rest(a)Union@pool], n = n1+n2; u = Table[0,{n2+1}]; Do[ Which[ i+j == 0, u[[j+1]] = 1, i+j < n && pool[[i+j]] < pool[[i+j+1]] && Abs[n2*i-n1*j] >= x, u[[j+1]] = 0, i == 0, u[[j+1]] = u[[j]], j > 0, u[[j+1]] += u[[j]]], {i,0,n1},{j,0,n2}]; N[1 - Last@u/Multinomial[n1,n2]]} ]
From: Darren Glosemeyer on 20 Jul 2010 03:41 Aaron Bramson wrote: > Hello Everybody, > > I would like to perform a 2-sample k-s test. I've seen some posts on the > archive about the one-sample goodness-of-fit version of the > Kolgomorov-Smirnov test, but I'm interested in the 2-sample version. > > Here's a description of the method: > http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test#Two-sample_Kolmogorov.E2.80.93Smirnov_test > > I think all the necessary components are available; e.g. > Accumulate[BinCounts[_list_]] to get the ecdf of both datasets, abs, max of > a list, etc. But the data management is a bit above my current skill > level. Also, since all other software packages seem to include this test > capability, I would be really surprised if there > wasn't a package somewhere that included it by now, but I've searched a lot > and can't find it. Can anybody help me locate this this? > > Alternatively, would anybody like to work with me to build this in case it > can't be found? > > Thanks in Advance, > Aaron > > > Hi Aaron, Here is some code written by Andy Ross at Wolfram for the two sample Kolmogorov-Smirnov test. KolmogorovSmirnov2Sample computes the test statistic, and KSBootstrapPValue provides a bootstrap estimate of the p-value given the two data sets, the number of simulations for the estimate and the test statistic. In[1]:= empiricalCDF[data_, x_] := Length[Select[data, # <= x &]]/Length[data] In[2]:= KolmogorovSmirnov2Sample[data1_, data2_] := Block[{sd1 = Sort[data1], sd2 = Sort[data2], e1, e2, udat = Union[Flatten[{data1, data2}]], n1 = Length[data1], n2 = Length[data2], T}, e1 = empiricalCDF[sd1, #] & /@ udat; e2 = empiricalCDF[sd2, #] & /@ udat; T = Max[Abs[e1 - e2]]; (1/Sqrt[n1]) (Sqrt[(n1*n2)/(n1 + n2)]) T ] In[3]:= KSBootstrapPValue[data1_, data2_, nSamp_, T_] := Block[{udat = Union[Flatten[{data1, data2}]], n1 = Length[data1], n2 = Length[data2], d1Samples, d2Samples, Tdist}, d1Samples = RandomChoice[udat, {nSamp, n1}]; d2Samples = RandomChoice[udat, {nSamp, n2}]; Tdist = MapThread[ KolmogorovSmirnov2Sample[#1, #2] &, {d1Samples, d2Samples}]; 1 - empiricalCDF[Tdist, T] // N ] (* create two data sets *) In[4]:= data1 = RandomReal[NormalDistribution[], 15]; In[5]:= data2 = RandomReal[NormalDistribution[], 15]; (* compute the test statistic *) In[6]:= stat = KolmogorovSmirnov2Sample[data1, data2] // N Out[6]= 0.329983 (* estimate the p-value from 10000 bootstrap samples *) In[7]:= KSBootstrapPValue[data1, data2, 10000, stat] Out[7]= 0.0213 Darren Glosemeyer Wolfram Research
From: Bill Rowe on 20 Jul 2010 03:42 On 7/19/10 at 2:11 AM, aaronbramson(a)gmail.com (Aaron Bramson) wrote: >I would like to perform a 2-sample k-s test. I've seen some posts >on the archive about the one-sample goodness-of-fit version of the >Kolgomorov-Smirnov test, but I'm interested in the 2-sample version. >Here's a description of the method: >http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test#Two- >sample_Kolmogorov.E2.80. 93Smirnov_test >I think all the necessary components are available; e.g. >Accumulate[BinCounts[_list_]] to get the ecdf of both datasets, abs, >max of a list, etc. But the data management is a bit above my >current skill level. Also, since all other software packages seem >to include this test capability, I would be really surprised if >there wasn't a package somewhere that included it by now, but I've >searched a lot and can't find it. Can anybody help me locate this >this? >Alternatively, would anybody like to work with me to build this in >case it can't be found? It is simple to create a function that will do what is needed. For example, ksTwoSampleTest[xdata_, ydata_] := Module[{nx, ny, k}, {nx, ny} = Length /@ {xdata, ydata}; k = Max[nx, ny]; Sqrt[nx ny/(nx + ny)] Max@ Table[Abs[Quantile[xdata, x/k] - Quantile[ydata, x/k]], {x, k}]] Note, while this is a simple implementation it may not be optimal for large data sets. My *guess* is by using Quantile and not pre-sorting the data, there is more work being done by this code than is really needed. i suspect that the approach I've used here has a complexity of order n^2 which should't be a problem for modest data sets but will certainly be an issue for large data sets.
From: Bill Rowe on 21 Jul 2010 07:11 On 7/20/10 at 3:41 AM, readnews(a)sbcglobal.net (Bill Rowe) wrote: >On 7/19/10 at 2:11 AM, aaronbramson(a)gmail.com (Aaron Bramson) wrote: > >>Alternatively, would anybody like to work with me to build this in >>case it can't be found? >It is simple to create a function that will do what is needed. For >example, And then supplied an incorrect solution. Please ignore it.
|
Next
|
Last
Pages: 1 2 3 Prev: A ODE I need to solve Next: Problems with Workbench Debugger Breakpoints |