From: Rob Johnson on 19 Jul 2010 06:01 In article <Y5b0o.12935$OU6.12052(a)newsfe20.iad>, "Michael Robinson" <kellrobinson(a)yahoo.com> wrote: >"Ray Vickson" <RGVickson(a)shaw.ca> wrote in message >news:230f2988-b158-4cd4-8219-7be34e488f20(a)k1g2000prl.googlegroups.com... >>On Jul 16, 2:29 pm, Robert Israel >><isr...(a)math.MyUniversitysInitials.ca> wrote: >>> Ray Vickson <RGVick...(a)shaw.ca> writes: >>> > In this case you are told that exactly 550 students support candidate >>> > A. Now, if 100 of the 1000 show up, AND IF THE SELECTION OF THE 100 IS >>> > RANDOM, then the number (in 100) voting for A has the so-called >>> > *hypergeometric distribution*. In general, in a population of size N >>> > with N1 of type 1 and N2 of type 2 (N1 +N2 = N), for a random sample >>> > of size n the number X of type 1 in the sample is hypergeometric: Pr{X >>> > = k} = C(N1,k)*C(N2,n-k)/C(N,n), where C(a,b) = binomial >>> > coefficient >>> > "a choose b" = a!/[b!*(a-b)!]. For N1 = 550, N2 = 450 and n = >>> > 100 >>> > w>>> > e >>> > have P(k) = Pr{k suppport A} = C(550,k)*C(450,100-k)/C(1000,100), >>> > and >>> > you want to compute sum[P(k),k=0.. 49]. The book wants you to >>> > simulate, but direct computation is easier, especially if you use the >>> > binomial approximation to the hypergeometric (which should be OK >>> > because n = 100 is small compared with N = 1000 and the point of >>> > interest (k = 49) is near the middle of the range 0..100). The >>> > binomial would be exact for "sampling with replacement", where we >>> > select 100 students randomly, one-by-one, so the same student can, by >>> > chance, be selected more than once. Since there are 1000 students and >>> > we are just selecting 100 there is not much chance of having a >>> > "duplicate" in the sample, >>> >>> On the contrary, the probability of having at least one duplicate in the >>> sample is very high: 1 - (1000!/900!)/1000^100 = .9940410734 >>> approximately. >>> But there are probably not very many duplicates, so the binomial >>> approximation >>> is not >>> too bad (still, it's not very good, as noted in my previous posting). >>> -- >>> Robert Israel isr...(a)math.MyUniversitysInitials.ca >>> Department of Mathematics http://www.math.ubc.ca/~israel >>> University of British Columbia Vancouver, BC, Canada >> >>Well, here are results for various "large" N, showing Pr{most popular >>candidate loses} = Pr{Votes <= 49} for the hypergeometric and binomial >>cases (from Maple 9.5): >>N hypergeom binomial >>1000 1.220852e-01 1.345762e-01 >>1500 1.263773e-01 1.345762e-01 >>2000 1.284742e-01 1.345762e-01 >>2500 1.297170e-01 1.345762e-01 >>3000 1.305392e-01 1.345762e-01 >>3500 1.311235e-01 1.345762e-01 >>4000 1.315600e-01 1.345762e-01 >> >>When N is large enough that both .55*N and .45*N are, say, more than >>10 times as large as the sample size n = 100, the binomial and >>hypergeometric cases are the same to about two decimal places. >> >>R.G. Vickson > >The relative error is on the order of sample divided by population. >E.g., for population 1000: >0.1346-0.1221=0.125 >(10/1000)(0.1221) = .0122 >It gets more accurate with bigger numbers. Indeed. To make this precise, we can write the summand as C(9n/20,k) C(11n/20,100-k) -------------------------- C(n,100) = (9n/20)(9n/20 - 1)(9n/20 - 2)...(9n/20 - (k-1)) * (11n/20)(11n/20 - 1)(11n/20 - 2)...(11n/20 - (99-k)) / (n)(n-1)(n-2)...(n-99) * C(100,k) = (9/20)(9/20 - 1/n)(9/20 - 2/n)...(9/20 - (k-1)/n) * (11/20)(11/20 - 1/n)(11n/20 - 2/n)...(11n/20 - (99-k)/n) / (1)(1-1/n)(1-2/n)...(1-99/n) * C(100,k) [1] Examining [1], we can see why, in the limit as n tends to infinity, the hypergeometric term approaches the binomial term (9/20)^k (11/20)^{100-k} C(100,k) [2] We can use Newton's Identities <http://en.wikipedia.org/wiki/Newton%27s_identities> to get an asymptotic expansion for [1]. That is, writing the three extended product terms asymptotically in 1/n, we get (9/20)(9/20 - 1/n)(9/20 - 2/n)...(9/20 - (k-1)/n) = (9/20)^k ( 1 - (20/9) (k^2 - k)/2 1/n + (20/9)^2 (3k^4 - 10k^3 + 9k^2 - 2k)/24 1/n^2 - (20/9)^3 (k^6 - 7k^5 + 17k^4 - 17k^3 + 6k^2)/48 1/n^3 + O(1/n^4) ) [3] (11/20)(11/20 - 1/n)(11n/20 - 2/n)...(11n/20 - (99-k)/n) = (11/20)^{100-k} ( 1 - (20/11) (k^2 - 199k + 9900)/2 1/n + (20/11)^2 (3k^4 - 1190k^3 + 177009k^2 - 11701798k + 290089800)/24 1/n^2 - (20/11)^3 (k^6 - 593k^5 + 146517k^4 - 19306783k^3 + 1431014906k^2 - 56567491200k + 931683060000)/48 1/n^3 + O(1/n^4) ) [4] (1)(1-1/n)(1-2/n)...(1-99/n) = (1 - 4950 1/n + 12087075 1/n^2 - 19410063750 1/n^3 + O(1/n^4) ) [5] Multiplying C(100,k) by [3] x [4] / [5], and summing from k = 51 to k = 100, we get .13457621318805238678 - 11.928556575663902106 1/n - 539.65341919938817150 1/n^2 - 22350.705078834868693 1/n^3 + O(1/n^4) [6] Using [6] to estimate the same values of n as in Ray Vickson's table, we get n asymptotic actual 1000 .1220856524881102616 .12208522168818469976 1500 .1263773737423866988 .12637728404583215165 2000 .1284742277072857343 .12847419866997887870 2500 .1297170155655898784 .12971700351911424587 3000 .1305392383679171231 .13053923251100078796 3500 .1311234795268475006 .13112347634755554495 4000 .1315599964756695927 .13155999460430289938 Rob Johnson <rob(a)trash.whim.org> take out the trash before replying to view any ASCII art, display article in a monospaced font
First
|
Prev
|
Pages: 1 2 3 Prev: (1) + (1+1/4) + (1+1/4+1/9) + ...= gamma(-1) ? Next: Transform -- the game |