From: gearhead on 16 Jul 2010 13:24 I'm an engineering undergrad in an intro stats course. We had a question in the book that's really dumb. problem as stated: Your candidate has 55% of the votes in the entire school. But only 100 students will show up to vote. What is the probability that the underdog (the one with 45% support) will win? To find out, set up a simulation. a) Describe how you will simulate a component and its outcomes. b) Describe how you will simulate a trial. c) Describe the response variable. The answer in the back of the book says using a two digit random number to determine each vote (00-54 for your candidate, 55-99 for the underdog) you would run a string of trials with 100 votes to each trial. Now, this is one misconceived exercise. Let me explain why. Say the school has 1000 students. If all of them show up, the underdog has 0% chance of winning. If exactly one voter shows up, underdog has 45% chance of winning. In an election where 100 voters show up, underdog's chance of winning the election HAS to lie somewhere between 0% and 45%. No ifs, ands or buts. The probability of a win for underdog can never exceed 45%. When the exercise asks "how often will the underdog win" I interpret that as meaning what are his chances, i.e., the probability that he will win. But if you run a simulation, you can get anything, including results above 45%. I don't think simulating has any validity here, at least the procedure suggested in the answer key. That is a lot of simulating to do by hand, 100 per trial, but it is nowhere close to even starting to answer the actual question. You would first of all have to know the population of the school and then do some very demanding simulations that would only be practical on a computer. But leaving practical considerations aside, that question is meaningless without knowing something about the magnitude of the school population. Consider: if the total population is 108, the underdog cannot win, because he only has 49 (48.6 rounded up) supporters total. Chance of winning 0%. Period. "Underdog" has NO CHANCE of winning the election. But if you run a simulation the way the book suggests, he's going to win some. I'm saying the book is wrong. Back to our school of 1000 students, out of whom 450 would vote for "underdog." If only 100 students vote, what are his chances of winning? Simulation will send you on the wrong track here unless you're ready for some head scratching and a big grind on the computer, but I'm sure this problem has a pretty simple theoretical solution.
From: porky_pig_jr on 16 Jul 2010 13:53 On Jul 16, 1:24 pm, gearhead <nos...(a)billburg.com> wrote: > > But leaving practical considerations aside, that question is > meaningless without knowing something about the magnitude of the > school population. Yes. > Back to our school of 1000 students, out of whom 450 would vote for > "underdog." If only 100 students vote, what are his chances of > winning? Simulation will send you on the wrong track here unless > you're ready for some head scratching and a big grind on the computer, > but I'm sure this problem has a pretty simple theoretical solution. Let those voting for non-underdog be A-students, those voting for underdog are B-students. You want to choose 100 out of 1000 so that the number of B-students is at least 51. Let "win" be choosing B-student, "loose" - choose A-student. p, the probability of win in a single trial is 0.45, q, the probability of loss in a single trial = 1 - p = 0.55. A probability of wining exactly 51 out of 100 trials is (0.45^51) * (0.55^49). The number of such choices is "100 choose 51", or 100 C 51. Then probability of choosing exactly 51 out of a 100 is (100 C 51) (0.45^51) (0.55^49). This is binomial distribution, look up on a web or in any elementary probability textbook for more details. Now choosing at least 51 is choosing 51 or 52 or ... or 100. Those are mutually exclusive events, and so to compute probability of choosing at least 51, add probability of choosing 51 + probability of choosing 52 + ... + probability of choosing 100. So, as you can see, it does take a while to compute, even with calculator. Of course, you can write a very simple program, no big deal. In any case, notice that computing (100 C 51), (100 C 52), etc. involves factorials. So you may get very large numbers and integers overflow. And, of course, without knowing the total population we can't determine the value of p, so, yes, the problem, as it's stated, lacks some key information.
From: porky_pig_jr on 16 Jul 2010 14:36 On Jul 16, 1:53 pm, "porky_pig...(a)my-deja.com" <porky_pig...(a)my- deja.com> wrote: > On Jul 16, 1:24 pm, gearhead <nos...(a)billburg.com> wrote: > > > > > But leaving practical considerations aside, that question is > > meaningless without knowing something about the magnitude of the > > school population. > > Yes. > Well, scratch the rest out. I was too quick. And wrong. Notice that my solution even didn't take into account the total population. I think, the correct solution goes like this. We have 550 A-students, 450 B-students. Say, we want to select exactly 49 A-students and 51 B- students. This is hypergeometric distribution: choosing without replacement. Now we can choose 49 out of 550 A-students in (550 C 49) different ways, we can choose 51 out of B-students in (450 C 51) different ways. And we can choose any 100 out of 1000 students in (1000 C 100) different ways. So choosing exactly 51 B-students (and 49 A-students) out of total of 550 A-students and 450 B-students (and order does not matter) has a probability (550 C 49) * (450 C 51) -------------------------- 1000 C 100 And that was just for exactly 51 B-students. Now you have to compute the same for 52, 53, ... 100 B-students. My mistake: Binomial distribution is associated with trials with replacement. But these are clearly trials *without* replacement. Every time we pick up the students out of the total population, we don't put it back. So it is hypergeometric, not binomial. The trials are *not* "independent identically distributed", like in binomial. Sorry about that. PPJ.
From: Ray Vickson on 16 Jul 2010 16:46 On Jul 16, 10:24 am, gearhead <nos...(a)billburg.com> wrote: > I'm an engineering undergrad in an intro stats course. We had a > question in the book that's really dumb. > > problem as stated: > > Your candidate has 55% of the votes in the entire school. But only > 100 students will show up to vote. What is the probability that the > underdog (the one with 45% support) will win? To find out, set up a > simulation. > a) Describe how you will simulate a component and its outcomes. > b) Describe how you will simulate a trial. > c) Describe the response variable. > > The answer in the back of the book says using a two digit random > number to determine each vote (00-54 for your candidate, 55-99 for the > underdog) you would run a string of trials with 100 votes to each > trial. > > Now, this is one misconceived exercise. Let me explain why. > > Say the school has 1000 students. In this case you are told that exactly 550 students support candidate A. Now, if 100 of the 1000 show up, AND IF THE SELECTION OF THE 100 IS RANDOM, then the number (in 100) voting for A has the so-called *hypergeometric distribution*. In general, in a population of size N with N1 of type 1 and N2 of type 2 (N1 +N2 = N), for a random sample of size n the number X of type 1 in the sample is hypergeometric: Pr{X = k} = C(N1,k)*C(N2,n-k)/C(N,n), where C(a,b) = binomial coefficient "a choose b" = a!/[b!*(a-b)!]. For N1 = 550, N2 = 450 and n = 100 we have P(k) = Pr{k suppport A} = C(550,k)*C(450,100-k)/C(1000,100), and you want to compute sum[P(k),k=0.. 49]. The book wants you to simulate, but direct computation is easier, especially if you use the binomial approximation to the hypergeometric (which should be OK because n = 100 is small compared with N = 1000 and the point of interest (k = 49) is near the middle of the range 0..100). The binomial would be exact for "sampling with replacement", where we select 100 students randomly, one-by-one, so the same student can, by chance, be selected more than once. Since there are 1000 students and we are just selecting 100 there is not much chance of having a "duplicate" in the sample, so there is not much difference between the exact hypergeometric and approximate binomial. If you do use the binomial you can get the solution using a spreadsheet, or even a decent scientific hand-held calculator. For the exact hypergeometric you can use EXCEL's built-in hypergeometric calculator to compute the P(k) for k = 0..49 then add them up, or you can use an on-line calculator, such as http://stattrek.com/Tables/Hypergeometric.aspx . Of course, you can also simulate, and maybe the (unnamed) book wants you to do that in order to familiarize yourself with simulation tools. What about if you don't know the school population? If you ASSUME the population is large, say N = 1000 or more, then the hypergeometric h(. 55N,.45N,100) is almost the same as the binomial Bi(100,.55), so you can just use the binomial approximation. However, your original complaint is valid. In particular, if N is not much larger than the n = 100, the results will depend critically on the precise value of N. R.G. Vickson > If all of them show up, the > underdog has 0% chance of winning. If exactly one voter shows up, > underdog has 45% chance of winning. In an election where 100 voters > show up, underdog's chance of winning the election HAS to lie > somewhere between 0% and 45%. No ifs, ands or buts. > The probability of a win for underdog can never exceed 45%. When the > exercise asks "how often will the underdog win" I interpret that as > meaning what are his chances, i.e., the probability that he will win. > But if you run a simulation, you can get anything, including results > above 45%. I don't think simulating has any validity here, at least > the procedure suggested in the answer key. That is a lot of > simulating to do by hand, 100 per trial, but it is nowhere close to > even starting to answer the actual question. You would first of all > have to know the population of the school and then do some very > demanding simulations that would only be practical on a computer. > > But leaving practical considerations aside, that question is > meaningless without knowing something about the magnitude of the > school population. > Consider: if the total population is 108, the underdog cannot win, > because he only has 49 (48.6 rounded up) supporters total. Chance of > winning 0%. Period. "Underdog" has NO CHANCE of winning the > election. But if you run a simulation the way the book suggests, he's > going to win some. > I'm saying the book is wrong. > Back to our school of 1000 students, out of whom 450 would vote for > "underdog." If only 100 students vote, what are his chances of > winning? Simulation will send you on the wrong track here unless > you're ready for some head scratching and a big grind on the computer, > but I'm sure this problem has a pretty simple theoretical solution.
From: Robert Israel on 16 Jul 2010 16:52 "porky_pig_jr(a)my-deja.com" <porky_pig_jr(a)my-deja.com> writes: > On Jul 16, 1:53=A0pm, "porky_pig...(a)my-deja.com" <porky_pig...(a)my- > deja.com> wrote: > > On Jul 16, 1:24=A0pm, gearhead <nos...(a)billburg.com> wrote: > > > > > > > > > But leaving practical considerations aside, that question is > > > meaningless without knowing something about the magnitude of the > > > school population. > > > > Yes. > > > > Well, scratch the rest out. I was too quick. And wrong. Notice that my > solution even didn't take into account the total population. > > I think, the correct solution goes like this. We have 550 A-students, > 450 B-students. Say, we want to select exactly 49 A-students and 51 B- > students. This is hypergeometric distribution: choosing without > replacement. Now we can choose 49 out of 550 A-students in (550 C 49) > different ways, we can choose 51 out of B-students in (450 C 51) > different ways. And we can choose any 100 out of 1000 students in > (1000 C 100) different ways. So choosing exactly 51 B-students (and 49 > A-students) out of total of 550 A-students and 450 B-students (and > order does not matter) has a probability > > (550 C 49) * (450 C 51) > -------------------------- > 1000 C 100 > > And that was just for exactly 51 B-students. Now you have to compute > the same for 52, 53, ... 100 B-students. > > My mistake: Binomial distribution is associated with trials with > replacement. But these are clearly trials *without* replacement. Every > time we pick up the students out of the total population, we don't put > it back. So it is hypergeometric, not binomial. The trials are *not* > "independent identically distributed", like in binomial. True: the correct distribution is hypergeometric; the binomial distribution can be used as an approximation to it, but only in the case where the population is much larger than the sample size. There actually is a formula for the cumulative distribution function: if the population size is N of which S are A-students and the other N-S are B-students, and the sample size is m, then the probability of obtaining at most t A-students in the sample is (in Maple's notation) F(t) = 1 - hypergeom([1, -S+t+1, -m+1+t],[t+2, N-S-m+t+2],1)* m! * S! * (N-m)! * (N-S)!/((t+1)!*(S-t-1)!*(m-t-1)!*(N-S-m+t+1)!*N!) For example, if N = 1000, S = 550 and m = 100, F(49) is approximately 0.1220852217. If you used the binomial distribution with p = 0.55, F(49) would be approximately .1345762132. Another approximation would be to use the normal distribution with continuity correction and the mean and standard deviation for the hypergeometric distribution. The hypergeometric distribution for the number of A-students in the sample has mean mu = m*S/N and standard deviation sigma = sqrt(m*(S/N)*(1-S/N)*(N-m)/(N-1)); in this example mu = 55 and sigma = sqrt(825/37). The normal approximation with continuity correction is Phi((49.5 - mu)/sigma) = Phi(-1.164760348) = 0.1220580068. So in this case it is a much better approximation than the binomial. -- Robert Israel israel(a)math.MyUniversitysInitials.ca Department of Mathematics http://www.math.ubc.ca/~israel University of British Columbia Vancouver, BC, Canada
|
Next
|
Last
Pages: 1 2 3 Prev: (1) + (1+1/4) + (1+1/4+1/9) + ...= gamma(-1) ? Next: Transform -- the game |