From: Robert Israel on 16 Jul 2010 17:29 Ray Vickson <RGVickson(a)shaw.ca> writes: > In this case you are told that exactly 550 students support candidate > A. Now, if 100 of the 1000 show up, AND IF THE SELECTION OF THE 100 IS > RANDOM, then the number (in 100) voting for A has the so-called > *hypergeometric distribution*. In general, in a population of size N > with N1 of type 1 and N2 of type 2 (N1 +N2 =3D N), for a random sample > of size n the number X of type 1 in the sample is hypergeometric: Pr{X > =3D k} =3D C(N1,k)*C(N2,n-k)/C(N,n), where C(a,b) =3D binomial coefficient > "a choose b" =3D a!/[b!*(a-b)!]. For N1 =3D 550, N2 =3D 450 and n =3D 100 > w= > e > have P(k) =3D Pr{k suppport A} =3D C(550,k)*C(450,100-k)/C(1000,100), and > you want to compute sum[P(k),k=3D0.. 49]. The book wants you to > simulate, but direct computation is easier, especially if you use the > binomial approximation to the hypergeometric (which should be OK > because n =3D 100 is small compared with N =3D 1000 and the point of > interest (k =3D 49) is near the middle of the range 0..100). The > binomial would be exact for "sampling with replacement", where we > select 100 students randomly, one-by-one, so the same student can, by > chance, be selected more than once. Since there are 1000 students and > we are just selecting 100 there is not much chance of having a > "duplicate" in the sample, On the contrary, the probability of having at least one duplicate in the sample is very high: 1 - (1000!/900!)/1000^100 = .9940410734 approximately. But there are probably not very many duplicates, so the binomial approximation is not too bad (still, it's not very good, as noted in my previous posting). -- Robert Israel israel(a)math.MyUniversitysInitials.ca Department of Mathematics http://www.math.ubc.ca/~israel University of British Columbia Vancouver, BC, Canada
From: Ray Vickson on 17 Jul 2010 00:36 On Jul 16, 2:29 pm, Robert Israel <isr...(a)math.MyUniversitysInitials.ca> wrote: > Ray Vickson <RGVick...(a)shaw.ca> writes: > > In this case you are told that exactly 550 students support candidate > > A. Now, if 100 of the 1000 show up, AND IF THE SELECTION OF THE 100 IS > > RANDOM, then the number (in 100) voting for A has the so-called > > *hypergeometric distribution*. In general, in a population of size N > > with N1 of type 1 and N2 of type 2 (N1 +N2 =3D N), for a random sample > > of size n the number X of type 1 in the sample is hypergeometric: Pr{X > > =3D k} =3D C(N1,k)*C(N2,n-k)/C(N,n), where C(a,b) =3D binomial coefficient > > "a choose b" =3D a!/[b!*(a-b)!]. For N1 =3D 550, N2 =3D 450 and n =3D 100 > > w= > > e > > have P(k) =3D Pr{k suppport A} =3D C(550,k)*C(450,100-k)/C(1000,100), and > > you want to compute sum[P(k),k=3D0.. 49]. The book wants you to > > simulate, but direct computation is easier, especially if you use the > > binomial approximation to the hypergeometric (which should be OK > > because n =3D 100 is small compared with N =3D 1000 and the point of > > interest (k =3D 49) is near the middle of the range 0..100). The > > binomial would be exact for "sampling with replacement", where we > > select 100 students randomly, one-by-one, so the same student can, by > > chance, be selected more than once. Since there are 1000 students and > > we are just selecting 100 there is not much chance of having a > > "duplicate" in the sample, > > On the contrary, the probability of having at least one duplicate in the > sample is very high: 1 - (1000!/900!)/1000^100 = .9940410734 approximately. > But there are probably not very many duplicates, so the binomial approximation > is not > too bad (still, it's not very good, as noted in my previous posting). > -- > Robert Israel isr...(a)math.MyUniversitysInitials.ca > Department of Mathematics http://www.math.ubc.ca/~israel > University of British Columbia Vancouver, BC, Canada Well, here are results for various "large" N, showing Pr{most popular candidate loses} = Pr{Votes <= 49} for the hypergeometric and binomial cases (from Maple 9.5): N hypergeom binomial 1000 1.220852e-01 1.345762e-01 1500 1.263773e-01 1.345762e-01 2000 1.284742e-01 1.345762e-01 2500 1.297170e-01 1.345762e-01 3000 1.305392e-01 1.345762e-01 3500 1.311235e-01 1.345762e-01 4000 1.315600e-01 1.345762e-01 When N is large enough that both .55*N and .45*N are, say, more than 10 times as large as the sample size n = 100, the binomial and hypergeometric cases are the same to about two decimal places. R.G. Vickson
From: Michael Robinson on 17 Jul 2010 01:22 "Ray Vickson" <RGVickson(a)shaw.ca> wrote in message news:230f2988-b158-4cd4-8219-7be34e488f20(a)k1g2000prl.googlegroups.com... On Jul 16, 2:29 pm, Robert Israel <isr...(a)math.MyUniversitysInitials.ca> wrote: > Ray Vickson <RGVick...(a)shaw.ca> writes: > > In this case you are told that exactly 550 students support candidate > > A. Now, if 100 of the 1000 show up, AND IF THE SELECTION OF THE 100 IS > > RANDOM, then the number (in 100) voting for A has the so-called > > *hypergeometric distribution*. In general, in a population of size N > > with N1 of type 1 and N2 of type 2 (N1 +N2 =3D N), for a random sample > > of size n the number X of type 1 in the sample is hypergeometric: Pr{X > > =3D k} =3D C(N1,k)*C(N2,n-k)/C(N,n), where C(a,b) =3D binomial > > coefficient > > "a choose b" =3D a!/[b!*(a-b)!]. For N1 =3D 550, N2 =3D 450 and n =3D > > 100 > > w= > > e > > have P(k) =3D Pr{k suppport A} =3D C(550,k)*C(450,100-k)/C(1000,100), > > and > > you want to compute sum[P(k),k=3D0.. 49]. The book wants you to > > simulate, but direct computation is easier, especially if you use the > > binomial approximation to the hypergeometric (which should be OK > > because n =3D 100 is small compared with N =3D 1000 and the point of > > interest (k =3D 49) is near the middle of the range 0..100). The > > binomial would be exact for "sampling with replacement", where we > > select 100 students randomly, one-by-one, so the same student can, by > > chance, be selected more than once. Since there are 1000 students and > > we are just selecting 100 there is not much chance of having a > > "duplicate" in the sample, > > On the contrary, the probability of having at least one duplicate in the > sample is very high: 1 - (1000!/900!)/1000^100 = .9940410734 > approximately. > But there are probably not very many duplicates, so the binomial > approximation > is not > too bad (still, it's not very good, as noted in my previous posting). > -- > Robert Israel isr...(a)math.MyUniversitysInitials.ca > Department of Mathematics http://www.math.ubc.ca/~israel > University of British Columbia Vancouver, BC, Canada Well, here are results for various "large" N, showing Pr{most popular candidate loses} = Pr{Votes <= 49} for the hypergeometric and binomial cases (from Maple 9.5): N hypergeom binomial 1000 1.220852e-01 1.345762e-01 1500 1.263773e-01 1.345762e-01 2000 1.284742e-01 1.345762e-01 2500 1.297170e-01 1.345762e-01 3000 1.305392e-01 1.345762e-01 3500 1.311235e-01 1.345762e-01 4000 1.315600e-01 1.345762e-01 When N is large enough that both .55*N and .45*N are, say, more than 10 times as large as the sample size n = 100, the binomial and hypergeometric cases are the same to about two decimal places. R.G. Vickson The relative error is on the order of sample divided by population. E.g., for population 1000: 0.1346-0.1221=0.125 (10/1000)(0.1221) = .0122 It gets more accurate with bigger numbers.
From: I.N. Galidakis on 17 Jul 2010 21:49 porky_pig_jr(a)my-deja.com wrote: [snip] > Well, scratch the rest out. I was too quick. And wrong. [snip] > Sorry about that. But that's your specialty! Making stupid mistakes and then apologising. Keep it up, "Porky"... > PPJ. -- I.
From: Tim Little on 17 Jul 2010 23:30 On 2010-07-16, gearhead <nospam(a)billburg.com> wrote: > Back to our school of 1000 students, out of whom 450 would vote for > "underdog." If only 100 students vote, what are his chances of > winning? Simulation will send you on the wrong track here unless > you're ready for some head scratching and a big grind on the computer, > but I'm sure this problem has a pretty simple theoretical solution. With some additional assumptions, yes. Most importantly, that the sample of 100 voting students is random. It's a simple problem of choosing 100 objects without replacement from a population of 450 of one type and 550 of the other with a threshold count. Doing that with pencil and paper would be somewhat laborious, but a short program could deliver a perfectly accurate rational result or decimal approximation almost instantly. - Tim
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 Prev: (1) + (1+1/4) + (1+1/4+1/9) + ...= gamma(-1) ? Next: Transform -- the game |