stats/probability question [Math]

Prev: (1) + (1+1/4) + (1+1/4+1/9) + ...= gamma(-1) ?
Next: Transform -- the game

From: Robert Israel on 16 Jul 2010 17:29

Ray Vickson <RGVickson(a)shaw.ca> writes:

> In this case you are told that exactly 550 students support candidate
> A. Now, if 100 of the 1000 show up, AND IF THE SELECTION OF THE 100 IS
> RANDOM, then the number (in 100) voting for A has the so-called
> *hypergeometric distribution*. In general, in a population of size N
> with N1 of type 1 and N2 of type 2 (N1 +N2 =3D N), for a random sample
> of size n the number X of type 1 in the sample is hypergeometric: Pr{X
> =3D k} =3D C(N1,k)*C(N2,n-k)/C(N,n), where C(a,b) =3D binomial coefficient
> "a choose b" =3D a!/[b!*(a-b)!]. For N1 =3D 550, N2 =3D 450 and n =3D 100
> w=
> e
> have P(k) =3D Pr{k suppport A} =3D C(550,k)*C(450,100-k)/C(1000,100), and
> you want to compute sum[P(k),k=3D0.. 49]. The book wants you to
> simulate, but direct computation is easier, especially if you use the
> binomial approximation to the hypergeometric (which should be OK
> because n =3D 100 is small compared with N =3D 1000 and the point of
> interest (k =3D 49) is near the middle of the range 0..100). The
> binomial would be exact for "sampling with replacement", where we
> select 100 students randomly, one-by-one, so the same student can, by
> chance, be selected more than once. Since there are 1000 students and
> we are just selecting 100 there is not much chance of having a
> "duplicate" in the sample,

On the contrary, the probability of having at least one duplicate in the
sample is very high: 1 - (1000!/900!)/1000^100 = .9940410734 approximately.
But there are probably not very many duplicates, so the binomial approximation
is not
too bad (still, it's not very good, as noted in my previous posting).
--
Robert Israel israel(a)math.MyUniversitysInitials.ca
Department of Mathematics http://www.math.ubc.ca/~israel
University of British Columbia Vancouver, BC, Canada

From: Ray Vickson on 17 Jul 2010 00:36

On Jul 16, 2:29 pm, Robert Israel
<isr...(a)math.MyUniversitysInitials.ca> wrote:
> Ray Vickson <RGVick...(a)shaw.ca> writes:
> > In this case you are told that exactly 550 students support candidate
> > A. Now, if 100 of the 1000 show up, AND IF THE SELECTION OF THE 100 IS
> > RANDOM, then the number (in 100) voting for A has the so-called
> > *hypergeometric distribution*. In general, in a population of size N
> > with N1 of type 1 and N2 of type 2 (N1 +N2 =3D N), for a random sample
> > of size n the number X of type 1 in the sample is hypergeometric: Pr{X
> > =3D k} =3D C(N1,k)*C(N2,n-k)/C(N,n), where C(a,b) =3D binomial coefficient
> > "a choose b" =3D a!/[b!*(a-b)!]. For N1 =3D 550, N2 =3D 450 and n =3D 100
> > w=
> > e
> > have P(k) =3D Pr{k suppport A} =3D C(550,k)*C(450,100-k)/C(1000,100), and
> > you want to compute sum[P(k),k=3D0.. 49]. The book wants you to
> > simulate, but direct computation is easier, especially if you use the
> > binomial approximation to the hypergeometric (which should be OK
> > because n =3D 100 is small compared with N =3D 1000 and the point of
> > interest (k =3D 49) is near the middle of the range 0..100). The
> > binomial would be exact for "sampling with replacement", where we
> > select 100 students randomly, one-by-one, so the same student can, by
> > chance, be selected more than once. Since there are 1000 students and
> > we are just selecting 100 there is not much chance of having a
> > "duplicate" in the sample,
>
> On the contrary, the probability of having at least one duplicate in the
> sample is very high: 1 - (1000!/900!)/1000^100 = .9940410734 approximately.
> But there are probably not very many duplicates, so the binomial approximation
> is not
> too bad (still, it's not very good, as noted in my previous posting).
> --
> Robert Israel isr...(a)math.MyUniversitysInitials.ca
> Department of Mathematics http://www.math.ubc.ca/~israel
> University of British Columbia Vancouver, BC, Canada

Well, here are results for various "large" N, showing Pr{most popular
candidate loses} = Pr{Votes <= 49} for the hypergeometric and binomial
cases (from Maple 9.5):
N hypergeom binomial
1000 1.220852e-01 1.345762e-01
1500 1.263773e-01 1.345762e-01
2000 1.284742e-01 1.345762e-01
2500 1.297170e-01 1.345762e-01
3000 1.305392e-01 1.345762e-01
3500 1.311235e-01 1.345762e-01
4000 1.315600e-01 1.345762e-01

When N is large enough that both .55*N and .45*N are, say, more than
10 times as large as the sample size n = 100, the binomial and
hypergeometric cases are the same to about two decimal places.

R.G. Vickson

From: Michael Robinson on 17 Jul 2010 01:22

"Ray Vickson" <RGVickson(a)shaw.ca> wrote in message
news:230f2988-b158-4cd4-8219-7be34e488f20(a)k1g2000prl.googlegroups.com...
On Jul 16, 2:29 pm, Robert Israel
<isr...(a)math.MyUniversitysInitials.ca> wrote:
> Ray Vickson <RGVick...(a)shaw.ca> writes:
> > In this case you are told that exactly 550 students support candidate
> > A. Now, if 100 of the 1000 show up, AND IF THE SELECTION OF THE 100 IS
> > RANDOM, then the number (in 100) voting for A has the so-called
> > *hypergeometric distribution*. In general, in a population of size N
> > with N1 of type 1 and N2 of type 2 (N1 +N2 =3D N), for a random sample
> > of size n the number X of type 1 in the sample is hypergeometric: Pr{X
> > =3D k} =3D C(N1,k)*C(N2,n-k)/C(N,n), where C(a,b) =3D binomial
> > coefficient
> > "a choose b" =3D a!/[b!*(a-b)!]. For N1 =3D 550, N2 =3D 450 and n =3D
> > 100
> > w=
> > e
> > have P(k) =3D Pr{k suppport A} =3D C(550,k)*C(450,100-k)/C(1000,100),
> > and
> > you want to compute sum[P(k),k=3D0.. 49]. The book wants you to
> > simulate, but direct computation is easier, especially if you use the
> > binomial approximation to the hypergeometric (which should be OK
> > because n =3D 100 is small compared with N =3D 1000 and the point of
> > interest (k =3D 49) is near the middle of the range 0..100). The
> > binomial would be exact for "sampling with replacement", where we
> > select 100 students randomly, one-by-one, so the same student can, by
> > chance, be selected more than once. Since there are 1000 students and
> > we are just selecting 100 there is not much chance of having a
> > "duplicate" in the sample,
>
> On the contrary, the probability of having at least one duplicate in the
> sample is very high: 1 - (1000!/900!)/1000^100 = .9940410734
> approximately.
> But there are probably not very many duplicates, so the binomial
> approximation
> is not
> too bad (still, it's not very good, as noted in my previous posting).
> --
> Robert Israel isr...(a)math.MyUniversitysInitials.ca
> Department of Mathematics http://www.math.ubc.ca/~israel
> University of British Columbia Vancouver, BC, Canada

Well, here are results for various "large" N, showing Pr{most popular
candidate loses} = Pr{Votes <= 49} for the hypergeometric and binomial
cases (from Maple 9.5):
N hypergeom binomial
1000 1.220852e-01 1.345762e-01
1500 1.263773e-01 1.345762e-01
2000 1.284742e-01 1.345762e-01
2500 1.297170e-01 1.345762e-01
3000 1.305392e-01 1.345762e-01
3500 1.311235e-01 1.345762e-01
4000 1.315600e-01 1.345762e-01

When N is large enough that both .55*N and .45*N are, say, more than
10 times as large as the sample size n = 100, the binomial and
hypergeometric cases are the same to about two decimal places.

R.G. Vickson

The relative error is on the order of sample divided by population.
E.g., for population 1000:
0.1346-0.1221=0.125
(10/1000)(0.1221) = .0122
It gets more accurate with bigger numbers.

From: I.N. Galidakis on 17 Jul 2010 21:49

porky_pig_jr(a)my-deja.com wrote:
[snip]

> Well, scratch the rest out. I was too quick. And wrong. [snip]

> Sorry about that.

But that's your specialty! Making stupid mistakes and then apologising.

Keep it up, "Porky"...

> PPJ.
--
I.

From: Tim Little on 17 Jul 2010 23:30

On 2010-07-16, gearhead <nospam(a)billburg.com> wrote:
> Back to our school of 1000 students, out of whom 450 would vote for
> "underdog." If only 100 students vote, what are his chances of
> winning? Simulation will send you on the wrong track here unless
> you're ready for some head scratching and a big grind on the computer,
> but I'm sure this problem has a pretty simple theoretical solution.

With some additional assumptions, yes. Most importantly, that the
sample of 100 voting students is random. It's a simple problem of
choosing 100 objects without replacement from a population of 450 of
one type and 550 of the other with a threshold count.

Doing that with pencil and paper would be somewhat laborious, but a
short program could deliver a perfectly accurate rational result or
decimal approximation almost instantly.

- Tim

First | Prev | Next | Last
Pages: 1 2 3
Prev: (1) + (1+1/4) + (1+1/4+1/9) + ...= gamma(-1) ?
Next: Transform -- the game