Prev: Most *elementary* Q on quotient map of topological spaces
Next: On Coast to Coast AM this morning, the topic was the Shroud of Turin. ... bend, fold, staple, and mutilate.
From: Kaba on 5 Apr 2010 15:31 Pubkeybreaker wrote: > On Apr 5, 11:00 am, Kaba <n...(a)here.com> wrote: > > Hi, > > > > I am measuring the time spent by an algorithm. Let's assume it is a > > gaussian-distributed random variable. How many repetitions do I have to > > make to get a good estimate of the mean and standard deviation of this > > distribution? > > You need to define what you mean by "good". Well, I am happy when I am able to convince my readers that an algorithm A is clearly faster than algorithm B:) -- http://kaba.hilvi.org
From: Man. on 5 Apr 2010 15:49 Cookbook/FittingDataPage 1.Fit examples with sinusoidal functions 1.Generating the data 2.Fitting the data 3.A clever use of the cost function 2.Simplifying the syntax 3.Fitting gaussian-shaped data 1.Calculating the moments of the distribution 2.Fitting a 2D gaussian 4.Fitting a power-law to data with errors 1.Generating the data 2.Fitting the data 1 from pylab import * 2 from scipy import * 3 4 # Generate data points with noise 5 num_points = 150 6 Tx = linspace(5., 8., num_points) 7 Ty = Tx 8 9 tX = 11.86*cos(2*pi/0.81*Tx-1.32) + 0.64*Tx+4*((0.5- rand(num_points))*exp(2*rand(num_points)**2)) 10 tY = -32.14*cos(2*pi/0.8*Ty-1.94) + 0.15*Ty+7*((0.5- rand(num_points))*exp(2*rand(num_points)**2)) 1 # Fit the first set 2 fitfunc = lambda p, x: p[0]*cos(2*pi/p[1]*x+p[2]) + p[3]*x # Target function 3 errfunc = lambda p, x, y: fitfunc(p, x) - y # Distance to the target function 4 p0 = [-15., 0.8, 0., -1.] # Initial guess for the parameters 5 p1, success = optimize.leastsq(errfunc, p0[:], args=(Tx, tX)) 6 7 time = linspace(Tx.min(), Tx.max(), 100) 8 plot(Tx, tX, "ro", time, fitfunc(p1, time), "r-") # Plot of the data and the fit 9 10 # Fit the second set 11 p0 = [-15., 0.8, 0., -1.] 12 p2,success = optimize.leastsq(errfunc, p0[:], args=(Ty, tY)) 13 14 time = linspace(Ty.min(), Ty.max(), 100) 15 plot(Ty, tY, "b^", time, fitfunc(p2, time), "b-") 16 17 # Legend the plot 18 title("Oscillations in the compressed trap") 19 xlabel("time [ms]") 20 ylabel("displacement [um]") 21 legend(('x position', 'x fit', 'y position', 'y fit')) 22 23 ax = axes() 24 25 text(0.8, 0.07, 26 'x freq : %.3f kHz \n y freq : %.3f kHz' % (1/p1[1],1/ p2[1]), 27 fontsize=16, 28 horizontalalignment='center', 29 verticalalignment='center', 30 transform=ax.transAxes) 31 32 show() 1 # Target function 2 fitfunc = lambda T, p, x: p[0]*cos(2*pi/T*x+p[1]) + p[2]*x 3 # Initial guess for the first set's parameters 4 p1 = r_[-15., 0., -1.] 5 # Initial guess for the second set's parameters 6 p2 = r_[-15., 0., -1.] 7 # Initial guess for the common period 8 T = 0.8 9 # Vector of the parameters to fit, it contains all the parameters of the problem! 10 p = r_[T, p1, p2] 11 # Cost function of the fit, compare it to the previous example. 12 errfunc = lambda p, x1, y1, x2, y2: r_[ 13 fitfunc(p[0], p[1:4], x1) - y1, 14 fitfunc(p[0], p[4:7], x2) - y2 15 ] 16 # This time we need to pass the two sets of data, there are thus four "args". 17 p,success = optimize.leastsq(errfunc, p, args=(Tx, tX, Ty, tY)) 18 time = linspace(Tx.min(), Tx.max(), 100) # Plot of the first data and the fit 19 plot(Tx, tX, "ro", time, fitfunc(p[0], p[1:4], time),"r-") 20 21 # Plot of the second data and the fit 22 time = linspace(Ty.min(), Ty.max(),100) 23 plot(Ty, tY, "b^", time, fitfunc(p[0], p[4:7], time),"b-") 24 25 # Legend the plot 26 title("Oscillations in the compressed trap") 27 xlabel("time [ms]") 28 ylabel("displacement [um]") 29 legend(('x position', 'x fit', 'y position', 'y fit')) 30 31 ax = axes() 32 33 text(0.8, 0.07, 34 'x freq : %.3f kHz' % (1/p[0]), 35 fontsize=16, 36 horizontalalignment='center', 37 verticalalignment='center', 38 transform=ax.transAxes) 39 40 show() 1 from scipy import optimize 2 from numpy import * 3 4 class Parameter: 5 def __init__(self, value): 6 self.value = value 7 8 def set(self, value): 9 self.value = value 10 11 def __call__(self): 12 return self.value 13 14 def fit(function, parameters, y, x = None): 15 def f(params): 16 i = 0 17 for p in parameters: 18 p.set(params[i]) 19 i += 1 20 return y - function(x) 21 22 if x is None: x = arange(y.shape[0]) 23 p = [param() for param in parameters] 24 optimize.leastsq(f, p) 1 # giving initial parameters 2 mu = Parameter(7) 3 sigma = Parameter(3) 4 height = Parameter(5) 5 6 # define your function: 7 def f(x): return height() * exp(-((x-mu())/sigma())**2) 8 9 # fit! (given that data is an array with the data to fit) 10 fit(f, [mu, sigma, height], data) 1 from pylab import * 2 3 gaussian = lambda x: 3*exp(-(30-x)**2/20.) 4 5 data = gaussian(arange(100)) 6 7 plot(data) 8 9 X = arange(data.size) 10 x = sum(X*data)/sum(data) 11 width = sqrt(abs(sum((X-x)**2*data)/sum(data))) 12 13 max = data.max() 14 15 fit = lambda t : max*exp(-(t-x)**2/(2*width**2)) 16 17 plot(fit(X)) 18 19 show() 1 from numpy import * 2 from scipy import optimize 3 4 def gaussian(height, center_x, center_y, width_x, width_y): 5 """Returns a gaussian function with the given parameters""" 6 width_x = float(width_x) 7 width_y = float(width_y) 8 return lambda x,y: height*exp( 9 -(((center_x-x)/width_x)**2+((center_y-y)/ width_y)**2)/2) 10 11 def moments(data): 12 """Returns (height, x, y, width_x, width_y) 13 the gaussian parameters of a 2D distribution by calculating its 14 moments """ 15 total = data.sum() 16 X, Y = indices(data.shape) 17 x = (X*data).sum()/total 18 y = (Y*data).sum()/total 19 col = data[:, int(y)] 20 width_x = sqrt(abs((arange(col.size)-y)**2*col).sum()/ col.sum()) 21 row = data[int(x), :] 22 width_y = sqrt(abs((arange(row.size)-x)**2*row).sum()/ row.sum()) 23 height = data.max() 24 return height, x, y, width_x, width_y 25 26 def fitgaussian(data): 27 """Returns (height, x, y, width_x, width_y) 28 the gaussian parameters of a 2D distribution found by a fit""" 29 params = moments(data) 30 errorfunction = lambda p: ravel(gaussian(*p) (*indices(data.shape)) - 31 data) 32 p, success = optimize.leastsq(errorfunction, params) 33 return p 1 from pylab import * 2 # Create the gaussian data 3 Xin, Yin = mgrid[0:201, 0:201] 4 data = gaussian(3, 100, 100, 20, 40)(Xin, Yin) + random.random(Xin.shape) 5 6 matshow(data, cmap=cm.gist_earth_r) 7 8 params = fitgaussian(data) 9 fit = gaussian(*params) 10 11 contour(fit(*indices(data.shape)), cmap=cm.copper) 12 ax = gca() 13 (height, x, y, width_x, width_y) = params 14 15 text(0.95, 0.05, """ 16 x : %.1f 17 y : %.1f 18 width_x : %.1f 19 width_y : %.1f""" %(x, y, width_x, width_y), 20 fontsize=16, horizontalalignment='right', 21 verticalalignment='bottom', transform=ax.transAxes) 22 23 show() 1 from pylab import * 2 from scipy import * 3 4 # Define function for calculating a power law 5 powerlaw = lambda x, amp, index: amp * (x**index) 6 7 ########## 8 # Generate data points with noise 9 ########## 10 num_points = 20 11 12 # Note: all positive, non-zero data 13 xdata = linspace(1.1, 10.1, num_points) 14 ydata = powerlaw(xdata, 10.0, -2.0) # simulated perfect data 15 yerr = 0.2 * ydata # simulated errors (10%) 16 17 ydata += randn(num_points) * yerr # simulated noisy data 1 2 ########## 3 # Fitting the data -- Least Squares Method 4 ########## 5 6 # Power-law fitting is best done by first converting 7 # to a linear equation and then fitting to a straight line. 8 # 9 # y = a * x^b 10 # log(y) = log(a) + b*log(x) 11 # 12 13 logx = log10(xdata) 14 logy = log10(ydata) 15 logyerr = yerr / ydata 16 17 # define our (line) fitting function 18 fitfunc = lambda p, x: p[0] + p[1] * x 19 errfunc = lambda p, x, y, err: (y - fitfunc(p, x)) / err 20 21 pinit = [1.0, -1.0] 22 out = optimize.leastsq(errfunc, pinit, 23 args=(logx, logy, logyerr), full_output=1) 24 25 pfinal = out[0] 26 covar = out[1] 27 print pfinal 28 print covar 29 30 index = pfinal[1] 31 amp = 10.0**pfinal[0] 32 33 indexErr = sqrt( covar[0][0] ) 34 ampErr = sqrt( covar[1][1] ) * amp 35 36 ########## 37 # Plotting data 38 ########## 39 40 clf() 41 subplot(2, 1, 1) 42 plot(xdata, powerlaw(xdata, amp, index)) # Fit 43 errorbar(xdata, ydata, yerr=yerr, fmt='k.') # Data 44 text(5, 6.5, 'Ampli = %5.2f +/- %5.2f' % (amp, ampErr)) 45 text(5, 5.5, 'Index = %5.2f +/- %5.2f' % (index, indexErr)) 46 title('Best Fit Power Law') 47 xlabel('X') 48 ylabel('Y') 49 xlim(1, 11) 50 51 subplot(2, 1, 2) 52 loglog(xdata, powerlaw(xdata, amp, index)) 53 errorbar(xdata, ydata, yerr=yerr, fmt='k.') # Data 54 xlabel('X (log scale)') 55 ylabel('Y (log scale)') 56 xlim(1.0, 11) 57 58 savefig('power_law_fit.png') Cookbook/FittingData Apr 5, 11:03 am, Ludovicus <luir...(a)yahoo.com> wrote: > On Apr 5, 11:00 am, Kaba <n...(a)here.com> wrote: > > > Hi, > > > I am measuring the time spent by an algorithm. Let's assume it is a > > gaussian-distributed random variable. How many repetitions do I have to > > make to get a good estimate of the mean and standard deviation of this > > distribution? > > > I'd like a cook-book answer this time, because I am in hurry with these > > measurements. I know it's 101.. Probability is one of my weaker sides. > > > --http://kaba.hilvi.org > > How can, the time spent by an algorith, be a random variable ? > If it is run in the same computer the time is a constant. > If it is run in different computers the values are not of the same > genus. BDROP, EDROP = the number of points to ignore at the beginning and end of the scan, respectively. (Initial values = 0.) BBASE, EBASE = the number of points, excluding BDROP and EDROP, over which to fit the baseline at each end of the scan. (Initial values are 50.) >PROCEDURE BASESET :BDROP = CCUR :BBASE = CCUR - BDROP :EDROP = H0(NOINT) - CCUR :EBASE = H0(NOINT) - CCUR - EDROP :RETURN :FINISH >NREGION = 10,30,40,50,78,82,105,128 >NREGION(1) = 10 ; NREGION(2) = 30 >NREGION(3) = 40 ; NREGION(4) = 50 >NREGION(5) = 78 ; NREGION(6) = 82 >NREGION(7) = 105 ; NREGION(8) = 128 >NREGION = 0 or >NREGION = DEFAULT >PROCEDURE NRSET(N_R) :SCALAR N_I :NREGION = DEFAULT :IF N_R < 1 THEN; ? 'ILLEGAL ARGUMENT !'; RETURN; END :N_R = MIN(16,N_R) :FOR N_I = 2 TO N_R * 2 BY 2 : NREGION(N_I - 1) = CCUR : NREGION(N_I) = CCUR : END :RETURN :FINISH >DCBASE >NFIT = 0 >BASELINE >DCPCT = 40 >PCBASE PAGE SHOW >NFIT = 5 >BASELINE PAGE SHOW >PAGE SHOW >NFIT = 5 ; BSHAPE >BSHOW >PROCEDURE GROUPBASE(FIRST_SCAN, NO_OF_SCNS, FIT_ORDER) # FIRST_SCAN = the scan number for the first scan in the set. # NO_OF_SCNS = the number of consecutive scans in the set. # FIT_ORDER = the order of the Chebyshev polynomial to be fitted. :SCALAR SCAN_I :IF NO_OF_SCNS < 1 THEN : PRINT 'LESS THAN ONE SCAN NOT ALLOWED.' : RETURN : END :SCLEAR :NREGION = 0 :BDROP = 0 ; EDROP = 0 :FOR SCAN_I = FIRST_SCAN TO (FIRST_SCAN + NO_OF_SCNS - 1) : GET SCAN_I ; ACCUM : END :AVE :PAGE SHOW :BASESET #BASESET = the region-setting Procedure defined in Sec. 7.1 :NFIT = FIT_ORDER :BASELINE COPY(0,2) BMODEL COPY(0,1) :FOR SCAN_I = FIRST_SCAN TO (FIRST_SCAN + NO_OF_SCNS - 1) : GET SCAN_I ; DIFF PAGE SHOW PAUSE(10) : END :COPY(2,0) PAGE SHOW :RETURN :FINISH Chebyshev Polynomial Sinusoid -------------------- -------- BASELINE RIPPLE BSHAPE RSHAPE BSHOW RSHOW BMODEL RMODEL > RPERIOD = 100 > RIPPLE PAGE SHOW >DCBASE >PAGE SHOW >RPERIOD = 100 ; RSHAPE >RSHOW >MDBOX = 19; MDBASE >PAGE SHOW >RMS >PRINT 'RMS = ' VRMS --------------------------------------------------------------------- Adverb Value Usage --------------------------------------------------------------------- FIXH TRUE If you know the heights of the Gaussians and want GAUSS to hold their values constant. You must supply values to HEIGHT. The input and output values of HEIGHT will be identical. FALSE [Default] If you want GAUSS to fit the values of the heights; you need not supply values to HEIGHT in this case. GAUSS will return to HEIGHT the best-fit values for the heights of the Gaussians. FIXC TRUE If you know the centers of the Gaussians and want GAUSS to hold their values constant. You must supply values to CENTER. The input and output values of CENTER will be identical. FALSE [Default] If you want GAUSS to fit the values of the centers; you must supply initial guesses to CENTER. GAUSS will return to CENTER the best-fit values for the Gaussian centers. FIXHW TRUE If you know the widths of the Gaussians and want GAUSS to hold their values constant. You must supply values to HWIDTH. The input and output values of HWIDTH will be identical. FALSE [Default] If you want GAUSS to fit the values of the widths; you must supply initial guesses to HWIDTH. GAUSS will return to HWIDTH the best-fit values for the Gaussian widths. FIXRELH TRUE If you know the relative heights of the Gaussians but not the absolute heights. You must supply values for HEIGHT that represent your best guesses to the heights. GAUSS will use these values of HEIGHT(i) as initial guesses, will fit for a uniform scale factor, and will return to HEIGHT your initial guesses multiplied by the fitted scale factor. FALSE [Default] If you don't know the relative heights, of the Gaussians. FIXRELC TRUE If you know the relative separations of the Gaussians but not an overall offset for the complete pattern of Gaussians. You must supply values for CENTER that represent your best guesses to the values of the Gaussian centers. GAUSS will use these values of CENTER(i) as initial guesses, will fit for an overall offset to the pattern of Gaussians, and will return to CENTER your input values adjusted by the fitted offset. FALSE [Default] If you don't know the relative separations, of the Gaussians. FIXRELH TRUE If you know the relative widths of the Gaussians but not an overall scale factor for the widths to apply to each Gaussian. You must supply values for HWIDTH that represent your best guesses to the values of the widths. GAUSS will use these values of HWIDTH(i) as initial guesses, will fit for an overall scale factor for the widths, and will return to HWIDTH your input values multiplied by the fitted factor. FALSE [Default] If you don't know the relative widths, of the Gaussians. ----------------------------------------------------------------------- >BDROP = 400 ; EDROP = 0 >PEAK >PROCEDURE SETGAUSS(GAUSS_NUM) :SCALAR GAUSS_I :IF GAUSS_NUM < 1 THEN : PRINT 'LESS THAN ONE GAUSSIAN NOT ALLOWED' : RETURN : END :IF GAUSS_NUM > 24 THEN : PRINT 'MORE THAN 24 GAUSSIANS NOT ALLOWED' : RETURN : END :NGAUSS = GAUSS_NUM :CENTER = 0 ; HWIDTH = 0 ; HEIGHT = 0 :PRINT 'CLICK ON ENDS OF REGION OVER WHICH TO FIT.' :BGAUSS = CCUR :EGAUSS = CCUR :IF BGAUSS > EGAUSS THEN : GAUSS_I = EGAUSS : EGAUSS = BGAUSS : BGAUSS = GAUSS_I : END :FOR GAUSS_I = 1 TO NGAUSS : PRINT 'CLICK ON PEAK, POSITIONING VERTCAL CURSORS FIRST.' : CENTER(GAUSS_I) = CCUR : PRINT 'CLICK ON HALF POWER POINTS.' : HWIDTH(GAUSS_I) = ABS(CCUR - CCUR) : END :RETURN :FINISH >GAUSS >PAGE SHOW >GPARTS >PAGE SHOW >GDISPLAY >PAGE SHOW >GMODEL RLINE RESHOW >PAGE SHOW >RESIDUAL RLINE RESHOW ____________________________________________________________________ | Chebyshev Baseline | Sinusoidal Baseline | Gaussian Fit | |-----------------------|------------------------|-----------------| | BSHAPE | RSHAPE | GAUSs | |-----------------------|------------------------|-----------------| | BASELINE | RIPPLE | GAUSS RESIDUAL | |-----------------------|------------------------|-----------------| | BMODEL | RMODEL | GMODEL | |-----------------------|------------------------|-----------------| | BSHOW | RSHOW | GPARTS | | | | or GDISPLAY | |-----------------------|------------------------|-----------------| The equivalent adverbs for the three different classes of fitting operations are, ____________________________________________________________________ | Chebyshev Baseline | Sinusoidal Baseline | Gaussian Fit | |-----------------------|------------------------|-----------------| | BDROP, EDROP | BDROP, EDROP | BGAUSS, EGAUSS | | BBASE, EBASE | BBASE, EBASE | | |-----------------------|------------------------|-----------------| | NREGION | NREGION | GREGION | |-----------------------|------------------------|-----------------| | BPARM | RPERIOD, RAMPLTDE, | HEIGHT, CENTER,| | | RPHASE | HWIDTH | |-----------------------|------------------------|-----------------| Cookbook
From: porky_pig_jr on 5 Apr 2010 16:17 On Apr 5, 2:55 pm, Kaba <n...(a)here.com> wrote: > porky_pig...(a)my-deja.com wrote: > > Me thinks running the same algorithm as many times as possible and > > considering the best time as the estimator of the "true running time" > > is the best you can do. > > I would agree, if the OS was the only variation. However, I also vary > the input pseudo-randomly (similar but not identical input). I am > assuming that the variation caused by OS is negligible. > > --http://kaba.hilvi.org Oh, in this case it *might* be normal distribution, but still you should try to get many runs (well, at least 30 to a hundred) and plot the times, just to see if you're getting a bell curve. And if you feel you're getting something skewed instead, it might be a different distribution. There are some non-parametric methods (like bootstrap) to help you with a claim that your distribution is normal (or not). That's probably something you can bring on probability/statistics forum. Once you have some observations. I can't think of any "cookbook" type of recipe. PPJ. PPJ.
From: Peter Webb on 5 Apr 2010 22:59 You want an analytic solution? You need two things: 1. The distribution of the input variable (the random number in this case) 2. The function which tells you how the time depends upon the input value. Consider the seive of Erastothenes. The algorithm varies as n^2 (maybe it doesn't; lets pretend). Pick a random number 0..9 with all cases equally likely. The PDF of this is k * n^2, you can calculate this and hence the mean and sd. Now generate 0..9 according to a Gaussian distribution G(n). The PDF of this is k*G(n)*n^2, again you can calculate this and hence the mean and sd. So for an analytic solution, you need to know: 1. The PDF of the input. 2. How the algorithm times depends on the input. Once you have these, you have at least an equation for the mean and sd, and you can probably solve it analytically. Without these, its going to be trial and error.
From: Man. on 5 Apr 2010 23:53
On Apr 5, 7:59 pm, "Peter Webb" <webbfam...(a)DIESPAMDIEoptusnet.com.au> wrote: > You want an analytic solution? > > You need two things: > > 1. The distribution of the input variable (the random number in this case) 1. Results 1 - 10 for distribution of the input variable. (0.24 seconds) A distribution-free approach to inducing rank correlation among ... This method is simple to use, is distribution free, preserves the exact form of the marginal distributions on the input variables, and may be used with any ... Input Variable Importance Definition based on: A New Input Variable Importance Definition. Quantization of Continuous Input Variables for Binary Classification is based on the distribution of the input variables. > 2. The function which tells you how the time depends upon the input value.. 2. Can you see that the words "is a function of" can be substituted by the words "depends upon"? > Consider the seive of Erastothenes. The algorithm varies as n^2 (maybe it > doesn't; lets pretend). > > Pick a random number 0..9 with all cases equally likely. The PDF of this is > k * n^2, you can calculate this and hence the mean and sd. > > Now generate 0..9 according to a Gaussian distribution G(n). The PDF of this > is k*G(n)*n^2, again you can calculate this and hence the mean and sd. > > So for an analytic solution, you need to know: > > 1. The PDF of the input. > 2. How the algorithm times depends on the input. > > Once you have these, you have at least an equation for the mean and sd, and > you can probably solve it analytically. > > Without these, its going to be trial and error. All you do is multiply the input value with 2, and add 2 to get the ... is doing the opposite (reverse) of what the machine tells you to ... Time taken for a particular journey is a function of average ... MMM |