Prev: Most *elementary* Q on quotient map of topological spaces
Next: On Coast to Coast AM this morning, the topic was the Shroud of Turin. ... bend, fold, staple, and mutilate.
From: jbriggs444 on 6 Apr 2010 08:09 On Apr 5, 3:31 pm, Kaba <n...(a)here.com> wrote: > Pubkeybreaker wrote: > > On Apr 5, 11:00 am, Kaba <n...(a)here.com> wrote: > > > Hi, > > > > I am measuring the time spent by an algorithm. Let's assume it is a > > > gaussian-distributed random variable. How many repetitions do I have to > > > make to get a good estimate of the mean and standard deviation of this > > > distribution? > > > You need to define what you mean by "good". > > Well, I am happy when I am able to convince my readers that an algorithm > A is clearly faster than algorithm B:) > > --http://kaba.hilvi.org An algorithm that returns 1.0 for the standard deviation and 0.0 for the mean is clearly optimal then. Normally when you're trying to optimize two things jointly, you don't use metrics that only consider one thing. But if that's your choice, far be it from me to second-guess you.
From: Pubkeybreaker on 6 Apr 2010 08:30 On Apr 5, 3:31 pm, Kaba <n...(a)here.com> wrote: > Pubkeybreaker wrote: > > On Apr 5, 11:00 am, Kaba <n...(a)here.com> wrote: > > > Hi, > > > > I am measuring the time spent by an algorithm. Let's assume it is a > > > gaussian-distributed random variable. How many repetitions do I have to > > > make to get a good estimate of the mean and standard deviation of this > > > distribution? > > > You need to define what you mean by "good". > > Well, I am happy when I am able to convince my readers that an algorithm > A is clearly faster than algorithm B:) > At what level of confidence? 2sigma? 3sigma? What you want is a students' t test for difference of means.
From: Kaba on 6 Apr 2010 12:31 Pubkeybreaker wrote: > > > You need to define what you mean by "good". > > > > Well, I am happy when I am able to convince my readers that an algorithm > > A is clearly faster than algorithm B:) > > > > At what level of confidence? 2sigma? 3sigma? > > What you want is a students' t test for difference of means. Exactly! Confidence 95% seems to be the standard so that's what I would use too, so 2sigma. I now see that I should have started the thread with the problem of comparing sample means rather than estimating the mean and standard deviation of each algorithm (and then comparing them). -- http://kaba.hilvi.org
From: Kaba on 6 Apr 2010 12:43 jbriggs444 wrote: > > > You need to define what you mean by "good". > > > > Well, I am happy when I am able to convince my readers that an algorithm > > A is clearly faster than algorithm B:) > > > > --http://kaba.hilvi.org > > An algorithm that returns 1.0 for the standard deviation and 0.0 for > the mean is clearly optimal then. > > Normally when you're trying to optimize two things jointly, you don't > use metrics that only consider one thing. But if that's your choice, > far be it from me to second-guess you. I can't understand what you are saying, although I read this many times:/ I am trying to compare rather than optimize. What I am after is some formal reasoning why my measured results should have any relevance at all, even if the measurements seem to imply that one algorithm is twice faster than the other. I have a feeling the t-test mentioned by Pubkeybreaker could fit the bill here.. -- http://kaba.hilvi.org
From: Now that is one happy monkey. on 6 Apr 2010 18:06
On Apr 6, 9:43 am, Kaba <n...(a)here.com> wrote: > jbriggs444 wrote: > > > > You need to define what you mean by "good". > > > > Well, I am happy when I am able to convince my readers that an algorithm > > > A is clearly faster than algorithm B:) > > > > --http://kaba.hilvi.org > > > An algorithm that returns 1.0 for the standard deviation and 0.0 for > > the mean is clearly optimal then. > > > Normally when you're trying to optimize two things jointly, you don't > > use metrics that only consider one thing.  But if that's your choice, > > far be it from me to second-guess you. > > I can't understand what you are saying, although I read this many > times:/ > > I am trying to compare rather than optimize. What I am after is some > formal reasoning why my measured results should have any relevance at > all, even if the measurements seem to imply that one algorithm is twice > faster than the other. > > I have a feeling the t-test mentioned by Pubkeybreaker could fit the > bill here.. > > --http://kaba.hilvi.org We have two groups and we want to figure out the likelihood of being correct. Are these two groups identical? 1. Calculate the difference in the means We use eqn 2.1 to calculate and and subtract, getting . 2. Calculate the estimated variance of this difference We use eqn. 2.11 to do get . 3. Form t .. 3. Calculate likelihood The number of degrees of freedom, is one less than the total number of data points, that is it equals. With that and t, we feed it into a program, or use a table to calculate the area under the curve as pictured in fig. 2.7. 4. Verdict If this area is less than our cutoff for significance, say we reject and say the difference is statistically significant. Otherwise, we can't reject. A hypothesis test always starts with two opposing hypotheses. The null hypothesis (H0): ⢠Usually states that some property of a population (such as the mean) is not different from a specified value or from a benchmark. ⢠Is assumed to be true until sufficient evidence indicates the contrary ⢠Is never proven true; you simply fail to disprove it. The alternative hypothesis (H1): ⢠States a null hypothesis is wrong ⢠May specify the direction of the difference Significance level Choose the α-level before conducting the test. ⢠Increasing α increases the chance of detecting a difference, but it also increases the chance of rejecting H0 when it is actually true (a Type I error). ⢠Decreasing α decreases the chance of making a Type I error, but also decreases the chance of correctly detecting a difference. Assume Each hypothesis test is based on one or more assumption(s) about the data being analyzed. If this/these assumption(s) is/are not met, the conclusions may not be correct. The assumptions for a one-sample t-test are: ⢠The sample must be random ⢠Sample data must be continuous ⢠Sample data should be normally distributed (although this assumption is less critical when the sample size is 30 or more) The t-test procedure is fairly robust to violations of the normality assumption, provided observations are collected randomly and data is continuous, unimodal, reasonably symmetric. Confidence interval The confidence interval provides a likely range of values for μ (or other population parameters). You may conduct two-tailed hypothesis test (alternative hypothesis of â ) using a confidence interval. For example, if the test value is not within a 95% confidence interval, you can reject H0 at the 0.05 α- level. Likewise, if you construct a 99% confidence interval and it does not include the test mean, you may reject H0 at the 0.01 α-level. Meami.org |