statistics folly [Design]

Prev: MsPacMan
Next: Notice of assessment

From: Michael Robinson on 16 Jul 2010 17:54

"Tim Wescott" <tim(a)seemywebsite.com> wrote in message
news:YeOdnX2xNNIGXd3RnZ2dnUVZ_qednZ2d(a)web-ster.com...
> On 07/16/2010 12:13 PM, Michael Robinson wrote:
>>> They're really just wording the question kinda poorly (and they're also
>>> assuming the student population is very, very large -- as you point out,
>> if
>>> there are only 100 kids at the school, you can come up with very
>> definitive
>>> answers). What they really mean is something like:
>>>
>>> -- You're performing sampling where 45% of the time you get answer A
>> (someone
>>> votes for the underdog), and 55% of the time you get answer B (a vote
>>> for
>> the
>>> other guy). If you perform 100 random samples, what's the likelihood
>>> that
>>
>>> you'll get more than 50 'A' answers?
>>>
>>> This is a standard statistics question, along the lines of, "If you roll
>>> a
>>
>>> fair dice 100 times, what's the likelihood you'll get '3' 20 or more
>> times?"
>>>
>>> Part of engineering is figuring out what your "customer" really wants
>>> when
>>
>>> their own description is kinda flaky. :-)
>>>
>>> ---Joel
>>>
>>>
>> If the school population is many, many magnitudes larger than the number
>> of
>> voters, the chance that underdog will win just reduces to 45% (the same
>> as
>> the underdog's chance of winning if only one student votes).
>> And in the case where the school population is relatively small, the
>> simulation methodology suggested is so bad it's not even wrong. Sampling
>> will always return about 45%, and we have seen that the chances of the
>> underdog winning can range as low as zero. The exercise is meaningless.
>> I think I should go for a walk.
>
> Uh, no.
>
> The probability distribution of the resulting vote is a binomial
> distribution (http://en.wikipedia.org/wiki/Binomial_distribution), with a
> peak at 55 votes for the winner and 45 votes for the loser. It'll have a
> variance of 100 * 0.45 * 0.55 = 24.75. With that many votes it'll be
> pretty close to a normal distribution, so the probability that a vote will
> go the wrong way is about 16%.
>
You can get an exact answer using the binomial distribution only in trials
with replacement. This problem descirbes trials without replacement.

If you use the binomial distribution here you will get an approximation.

If the school population is very large, then the approximation would be a
good one because the trial will be close to one with replacement -- in other
words, you'll seldom count a student twice when you do your sampling of 100
out of a much larger population.

The simple fact is that the book's naively constructed simulation solution
will give an answer that approaches validity only assuming very large school
population (and there's no point in doing a sim then because you already
know the answer).

For any school population where the outcome is worth calculating -- say, a
few hundred students -- the suggested sim is dead wrong. The "underdog's"
chance of winning varies. Always less than 45%, approaching zero as the
school pop approaches 109 or 108. While the sim always returns values
centering around 45%.

Now can you see why I said it's a dumb problem?

> So when you get back from your walk, you probably want to brush up on your
> statistics.

>
> Doing this by simulation makes no sense unless the aim of the exercise is
> to teach the student how to do Monte Carlo simulation, or to help them get
> a feel for that 16% probability of a wrong vote.
>
> --
>
> Tim Wescott
> Wescott Design Services
> http://www.wescottdesign.com
>
> Do you need to implement control loops in software?
> "Applied Control Theory for Embedded Systems" was written for you.
> See details at http://www.wescottdesign.com/actfes/actfes.html

From: Tim Wescott on 16 Jul 2010 18:00

On 07/16/2010 02:54 PM, Michael Robinson wrote:
> "Tim Wescott"<tim(a)seemywebsite.com> wrote in message
> news:YeOdnX2xNNIGXd3RnZ2dnUVZ_qednZ2d(a)web-ster.com...
>> On 07/16/2010 12:13 PM, Michael Robinson wrote:
>>>> They're really just wording the question kinda poorly (and they're also
>>>> assuming the student population is very, very large -- as you point out,
>>> if
>>>> there are only 100 kids at the school, you can come up with very
>>> definitive
>>>> answers). What they really mean is something like:
>>>>
>>>> -- You're performing sampling where 45% of the time you get answer A
>>> (someone
>>>> votes for the underdog), and 55% of the time you get answer B (a vote
>>>> for
>>> the
>>>> other guy). If you perform 100 random samples, what's the likelihood
>>>> that
>>>
>>>> you'll get more than 50 'A' answers?
>>>>
>>>> This is a standard statistics question, along the lines of, "If you roll
>>>> a
>>>
>>>> fair dice 100 times, what's the likelihood you'll get '3' 20 or more
>>> times?"
>>>>
>>>> Part of engineering is figuring out what your "customer" really wants
>>>> when
>>>
>>>> their own description is kinda flaky. :-)
>>>>
>>>> ---Joel
>>>>
>>>>
>>> If the school population is many, many magnitudes larger than the number
>>> of
>>> voters, the chance that underdog will win just reduces to 45% (the same
>>> as
>>> the underdog's chance of winning if only one student votes).
>>> And in the case where the school population is relatively small, the
>>> simulation methodology suggested is so bad it's not even wrong. Sampling
>>> will always return about 45%, and we have seen that the chances of the
>>> underdog winning can range as low as zero. The exercise is meaningless.
>>> I think I should go for a walk.
>>
>> Uh, no.
>>
>> The probability distribution of the resulting vote is a binomial
>> distribution (http://en.wikipedia.org/wiki/Binomial_distribution), with a
>> peak at 55 votes for the winner and 45 votes for the loser. It'll have a
>> variance of 100 * 0.45 * 0.55 = 24.75. With that many votes it'll be
>> pretty close to a normal distribution, so the probability that a vote will
>> go the wrong way is about 16%.
>>
> You can get an exact answer using the binomial distribution only in trials
> with replacement. This problem descirbes trials without replacement.
>
> If you use the binomial distribution here you will get an approximation.
>
> If the school population is very large, then the approximation would be a
> good one because the trial will be close to one with replacement -- in other
> words, you'll seldom count a student twice when you do your sampling of 100
> out of a much larger population.
>
> The simple fact is that the book's naively constructed simulation solution
> will give an answer that approaches validity only assuming very large school
> population (and there's no point in doing a sim then because you already
> know the answer).
>
> For any school population where the outcome is worth calculating -- say, a
> few hundred students -- the suggested sim is dead wrong.

Yes, that's been discussed.

Are you saying that the 1600 student school that I attended makes
calculations about it somehow not worthwhile?

(And no, I didn't do the math, so I don't have a good grasp of how
closely the binomial distribution would approximate in this case -- but
it's probably close)

> The "underdog's"
> chance of winning varies. Always less than 45%, approaching zero as the
> school pop approaches 109 or 108. While the sim always returns values
> centering around 45%.
>
> Now can you see why I said it's a dumb problem?

Yes, but if you're upset at a problem that's not stated clearly, why
didn't you clearly state your objections?

--

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Do you need to implement control loops in software?
"Applied Control Theory for Embedded Systems" was written for you.
See details at http://www.wescottdesign.com/actfes/actfes.html

From: Joerg on 16 Jul 2010 18:08

Tim Wescott wrote:
> On 07/16/2010 01:39 PM, Joerg wrote:
>> Joel Koltner wrote:
>>> "Joerg"<invalid(a)invalid.invalid> wrote in message
>>> news:8abmkrFbojU1(a)mid.individual.net...
>>>> One question I always pondered is, why are they teaching this in
>>>> engineering school anyhow?
>>>
>>> Some EE ends up using it? :-)
>>>
>>> Stats show up an awful lot in...
>>>
>>> -- Communication texts, worrying about the effect of nose on signal
>>> intelligibility --> Those trying to cook up new modulation formats
>>> should worry about this
>>> -- Error-correcting codes --> Those worrying about choosing
>>> error-correctoin schemes should worry about it
>>> -- Phil Hobbs' book :-)
>>> -- Tim Wescott's book :-)
>>>
>>
>> Also Monte Carlo in SPICE, named after _the_ casino city. Actually,
>> formally it's a whole country unto itself.
>>
>>
>>> I think the real answer is that curriciulums often have historical roots
>>> that are hard to change even when the material becomes of margin use for
>>> most students. Many a practicing BSEE can do just fine recalling no
>>> more statistics than, e.g., how to calculate a mean...
>>>
>>
>> Ok, yes, I agree that we all need it. My point really was, isn't this
>> sort of stuff the job of a high school to teach? There has got to be a
>> reason why we all must go to high school before heading towards
>> engineering :-)
>
> College stats is well beyond high school stats. College stats (at least
> the one that I took) is a 4th year class from the mathematics department
> that leaves many of the math majors in the dust.
>

Ok, then I may have a stats deficiency in my brain cell portfolio :-)

--
Regards, Joerg

http://www.analogconsultants.com/

"gmail" domain blocked because of excessive spam.
Use another domain or send PM.

From: amdx on 16 Jul 2010 18:12

>
> Doing this by simulation makes no sense unless the aim of the exercise is
> to teach the student how to do Monte Carlo simulation, or to help them get
> a feel for that 16% probability of a wrong vote.
>
Oh, so Obama's election was just a statistical anomaly.
I feel so much better now.
MikeK

From: Bill Sloman on 16 Jul 2010 18:42

On Jul 17, 8:12 am, "amdx" <a...(a)knology.net> wrote:
> > Doing this by simulation makes no sense unless the aim of the exercise is
> > to teach the student how to do Monte Carlo simulation, or to help them get
> > a feel for that 16% probability of a wrong vote.
>
> Oh, so Obama's election was just a statistical anomaly.
> I feel so much better now.

Actually, the US electors who got out to vote for Obama represent a
very large "school", and the likelysampling error on the result was
about 0.12% of his winning margin - 9,522,083 - out of 131,257,328
votes cast. The square root of 131,257,328 is about 11,457.

http://en.wikipedia.org/wiki/United_States_presidential_election,_2008

He won quite decisively - the biggest margin of any non-incumbent
candidate so far.

--
Bill Sloman, Nijmegen

First | Prev |
Pages: 1 2 3 4
Prev: MsPacMan
Next: Notice of assessment