Statistical Significance and When to Stop a Simulation? [Math]

Prev: What Else Can I Do With Modulus?
Next: Unique Prime Factorization theorem is also in the Indirect method #615 Correcting Math

From: Datesfat Chicks on 30 Jun 2010 14:45

Hi,

I'll be writing a program to simulate Blackjack play, and I have a question
or two about how long to run a simulation.

It is related to the following question ... assume that I have a fair coin
that I toss a large number of times, and assume that I calculate the number
of heads and tails and that I maintain the statistic (# of heads)/(# of
tosses). How many tosses would I need before I reach probability "p" that
the cumulative results are in the range [0.5-d, 0.5+d]?

Are there any easy to understand results out there that explain this or give
some rules of thumb about how the aggregation of random samples is likely to
converge?

Thanks for all info, Datesfat

----------

P.S.--Before I get flamed on this, I should explain that I understand what
Blackjack is, and I'm not working on some betting system that doesn't
withstand mathematical scrutiny (i.e. I'm not an idiot). However, after
playing blackjack on and off for about a year, I'm left with a few questions
that I want to resolve through simulation.

For example, once in a while I run into a shoe where I win a large number of
consecutive hands early in the shoe. It begs the question of whether it is
statistically sound to then stop playing until the next shoe is shuffled.
If the cards were random on each hand, the answer would clearly be NO. But
I'm wondering if the high number of early winning hands is correlated with
the card count and affects the distribution of cards likely to be remaining
in the shoe (i.e. whether number of wins is correlated with statistical
distribution of cards remaining). The problem just about impossible to
approach analytically, so I'd like to look at what happens in simulation.

As a second example, I see players that make a decision of whether to hit on
a hard 16 based on the number of face cards they see other players have at
that time. Again, if the cards were truly random, it wouldn't make a
difference. But because there are only so many face cards in the shoe,
there may be kind of a "depletion" effect where a large number of face cards
showing is correlated with card count and probability ... again, hard to
approach analytically.

I can't dismiss certain approaches I've seen without looking at it in more
detail through simulation.

I'm also interested because the house edge is so slight ... if one can
employ enough simultaneous strategies (including card counting) to gain an
edge even when the casino uses strategies to make card counting ineffective
.... that is an interesting problem.

From: Ray Vickson on 1 Jul 2010 04:10

On Jun 30, 11:45 am, "Datesfat Chicks" <datesfat.chi...(a)gmail.com>
wrote:
> Hi,
>
> I'll be writing a program to simulate Blackjack play, and I have a question
> or two about how long to run a simulation.
>
> It is related to the following question ... assume that I have a fair coin
> that I toss a large number of times, and assume that I calculate the number
> of heads and tails and that I maintain the statistic (# of heads)/(# of
> tosses). How many tosses would I need before I reach probability "p" that
> the cumulative results are in the range [0.5-d, 0.5+d]?

See the last paragraph for a very simple formula. Details precede
that.

Say you toss n times and let X = number of 'heads'. If I interpret
your use of the word 'cumulative' as meaning 'total', then you want to
have Pr{n/2 - d*n <= X <= n/2 + d*n} >= p; quite possibly you might
not be able to reach Pr = p exactly, but having Pr >= p is always
possible and satisfies what you want. In principle, X has the Binomial
distribution Bi(n,1/2): Pr(X = k} = C(n,k)/2^n for k = 0, 1, ..., n,
where C(n,k) = binomial coefficient. You want the smallest n giving
sum{Pr{X=k}: n/2 - d*n <= k <= n/2 + d*n} >= p. This is doable using
modern computational techniques, but for reasonable small 'd' (hence
large n) we can use the normal approximation; that is, use X ~ N(mean
= n/2, variance = n/4). Thus, find the smallest n giving Pr_normal{-
d*n <= X <= d*n } = p, hence Phi[d*n/sqrt(n/4)] - Phi[-d*n/sqrt(n/4)]
= p, where Phi(z) = standard normal cumulative distribution. (Note the
switch from ">= p" to "= p" as we go from the exact discrete case to
the approximate continuous case.) Decent scientific hand-held
calculators have Phi(z) buttons, and any spreadsheet will have this
function built-in. Using Phi(-z) = 1-Phi(z) for z > 0, we end up with
2*Phi[d*n/sqrt(n/4)]-1 = p, hence Phi[...] = (1+p)/2. Thus, if z_q
denotes the cumulative probability = q point on the standard normal
(that is, Phi(z_q) = q) we need d*n/sqrt(n/4) = z, hence n = [z /
(2*d)]^2 = (1/4)*(z/d)^2, where z = z_{(1+p)/2}.

For example, here are the exact binomial and approximate normal for d
= 0.1 and p = 0.95, computed in Maple 9.5.

p:=0.95; d:=0.1;
p := 0.95

d := 0.1

> B:=(n,k)->sum(binomial(n,j)/2^n, j=0..k);

k
-----
\ binomial(n, j)
B := (n, k) -> ) --------------
/ n
----- 2
j = 0

> B(n,k);

(-n)
1 - binomial(n, k + 1) 2 hypergeom([1, k + 1 - n], [k + 2], -1)

Note: Maple evaluates the cumulative distribution of the binomial in
terms of general
functions, so its formula applies as well to non-integer values of n
and k. For that reason we can use "Pr = p" instead of "Pr >= p", then
round up the solution.

> n1:=fsolve(B(n,n/2+n*d)-B(n,n/2-n*d)=p,n=80..120);

n1 := 96.55983909

Thus, n_exact = 97.

Now use the normal approximation:

> Phi:=z-> int(exp(-t^2/2)/sqrt(2*Pi),t=-infinity..z);

2
z t
/ exp(- ----)
| 2
Phi := z -> | ----------- dt
| sqrt(2 Pi)
|
/
-infinity

> Phi(z);

1/2
2 z
1/2 erf(------) + 1/2
2

> zq:=fsolve(Phi(z)=(1+p)/2,z=0..2);

zq := 1.959963985

> n2:=(zq/2/d)^2;

n2 := 96.03647055

This gives n_approx = 97.

If you have an unfair coin with Pr{heads} = h and Pr{tails} = 1-h,
then to find the smallest n giving Pr{h - d <= X/n <= h + d} in the
normal approximation, just use n = [z*sqrt(h*(1-h))/d]^2 = h*(1-h)*(z/
d)^2 , where z = z_{(1+p)/2} = (1+p)/2 cumulative probability point on
the standard normal distribution.

R.G. Vickson

>
> Are there any easy to understand results out there that explain this or give
> some rules of thumb about how the aggregation of random samples is likely to
> converge?
>
> Thanks for all info, Datesfat
>
> ----------
>
> P.S.--Before I get flamed on this, I should explain that I understand what
> Blackjack is, and I'm not working on some betting system that doesn't
> withstand mathematical scrutiny (i.e. I'm not an idiot). However, after
> playing blackjack on and off for about a year, I'm left with a few questions
> that I want to resolve through simulation.
>
> For example, once in a while I run into a shoe where I win a large number of
> consecutive hands early in the shoe. It begs the question of whether it is
> statistically sound to then stop playing until the next shoe is shuffled.
> If the cards were random on each hand, the answer would clearly be NO. But
> I'm wondering if the high number of early winning hands is correlated with
> the card count and affects the distribution of cards likely to be remaining
> in the shoe (i.e. whether number of wins is correlated with statistical
> distribution of cards remaining). The problem just about impossible to
> approach analytically, so I'd like to look at what happens in simulation.
>
> As a second example, I see players that make a decision of whether to hit on
> a hard 16 based on the number of face cards they see other players have at
> that time. Again, if the cards were truly random, it wouldn't make a
> difference. But because there are only so many face cards in the shoe,
> there may be kind of a "depletion" effect where a large number of face cards
> showing is correlated with card count and probability ... again, hard to
> approach analytically.
>
> I can't dismiss certain approaches I've seen without looking at it in more
> detail through simulation.
>
> I'm also interested because the house edge is so slight ... if one can
> employ enough simultaneous strategies (including card counting) to gain an
> edge even when the casino uses strategies to make card counting ineffective
> ... that is an interesting problem.

From: Datesfat Chicks on 1 Jul 2010 10:34

"Ray Vickson" <RGVickson(a)shaw.ca> wrote in message
news:8d950cea-7e13-48b3-a87c-e372bf62902a(a)s9g2000yqd.googlegroups.com...
On Jun 30, 11:45 am, "Datesfat Chicks" <datesfat.chi...(a)gmail.com>
wrote:
> Hi,
>
> I'll be writing a program to simulate Blackjack play, and I have a
> question
> or two about how long to run a simulation.
>
> It is related to the following question ... assume that I have a fair coin
> that I toss a large number of times, and assume that I calculate the
> number
> of heads and tails and that I maintain the statistic (# of heads)/(# of
> tosses). How many tosses would I need before I reach probability "p" that
> the cumulative results are in the range [0.5-d, 0.5+d]?
>
>In principle, X has the Binomial
>distribution Bi(n,1/2): Pr(X = k} = C(n,k)/2^n for k = 0, 1, ..., n,
>where C(n,k) = binomial coefficient. You want the smallest n giving
>sum{Pr{X=k}: n/2 - d*n <= k <= n/2 + d*n} >= p. This is doable using
>modern computational techniques, but for reasonable small 'd' (hence
>large n) we can use the normal approximation; that is, use X ~ N(mean
>= n/2, variance = n/4).
>
><SNIP>

Got it all, thanks.

Your post was exactly what I was looking for. I had forgotten (too long out
of school) that the binomial distribution is involved in terms of the
probabilities of each outcome.

I'm not a Maple user, but Wolfram's new site:

http://www.wolframalpha.com/

is pretty cool because it essentially gives one access to Mathematica
without having to buy it. So, I'm sure I can convince it to calculate
anything where I don't have the right software to do it.

I do have another question, though ...

When I thought about designing a simulation, I thought I would run the
simulation in large blocks (maybe a million cards), and only stop when the
result (the expected value of a hand using a certain playing strategy, for
example) hasn't changed much from the last time it was calculated (i.e. it
has seemed to settle).

So, my question might be something like the following:

a)Assume after the first million cards simulated, the expected value of a
hand is 0.9940.

b)Assume that after the second million cards simulated, the expected value
of a hand is 0.9945.

c)Assume that after the third million cards simulated, the expected value of
a hand is 0.9945.

d)How much information is contained in the fact that the value of interest
seems to have converged? In other words, can "apparent convergence" be used
as a substitute for analysis of statistical significance. and if so, how?

I think this question is related to the original question. Another way to
put it is, "If the value has apparently converged, what is the probability
that the actual value is different than the value it seemed to converge
on?".

Hope this make sense.

However, your analysis of the "unfair coin toss" is exactly what I was
looking for and gives me great guidance. That model (p != 0.5) precisely
covers blackjack play.

Thanks, Datesfat

From: Ray Vickson on 1 Jul 2010 13:03

On Jul 1, 7:34 am, "Datesfat Chicks" <datesfat.chi...(a)gmail.com>
wrote:
> "Ray Vickson" <RGVick...(a)shaw.ca> wrote in message
>
> news:8d950cea-7e13-48b3-a87c-e372bf62902a(a)s9g2000yqd.googlegroups.com...
> On Jun 30, 11:45 am, "Datesfat Chicks" <datesfat.chi...(a)gmail.com>
> wrote:
>
>
>
> > Hi,
>
> > I'll be writing a program to simulate Blackjack play, and I have a
> > question
> > or two about how long to run a simulation.
>
> > It is related to the following question ... assume that I have a fair coin
> > that I toss a large number of times, and assume that I calculate the
> > number
> > of heads and tails and that I maintain the statistic (# of heads)/(# of
> > tosses). How many tosses would I need before I reach probability "p" that
> > the cumulative results are in the range [0.5-d, 0.5+d]?
>
> >In principle, X has the Binomial
> >distribution Bi(n,1/2): Pr(X = k} = C(n,k)/2^n for k = 0, 1, ..., n,
> >where C(n,k) = binomial coefficient. You want the smallest n giving
> >sum{Pr{X=k}: n/2 - d*n <= k <= n/2 + d*n} >= p. This is doable using
> >modern computational techniques, but for reasonable small 'd' (hence
> >large n) we can use the normal approximation; that is, use X ~ N(mean
> >= n/2, variance = n/4).
>
> ><SNIP>
>
> Got it all, thanks.
>
> Your post was exactly what I was looking for. I had forgotten (too long out
> of school) that the binomial distribution is involved in terms of the
> probabilities of each outcome.
>
> I'm not a Maple user, but Wolfram's new site:
>
> http://www.wolframalpha.com/
>
> is pretty cool because it essentially gives one access to Mathematica
> without having to buy it. So, I'm sure I can convince it to calculate
> anything where I don't have the right software to do it.
>
> I do have another question, though ...
>
> When I thought about designing a simulation, I thought I would run the
> simulation in large blocks (maybe a million cards), and only stop when the
> result (the expected value of a hand using a certain playing strategy, for
> example) hasn't changed much from the last time it was calculated (i.e. it
> has seemed to settle).
>
> So, my question might be something like the following:
>
> a)Assume after the first million cards simulated, the expected value of a
> hand is 0.9940.
>
> b)Assume that after the second million cards simulated, the expected value
> of a hand is 0.9945.
>
> c)Assume that after the third million cards simulated, the expected value of
> a hand is 0.9945.
>
> d)How much information is contained in the fact that the value of interest
> seems to have converged? In other words, can "apparent convergence" be used
> as a substitute for analysis of statistical significance. and if so, how?

I don't have an answer for you. Perhaps posing this question on the
statistics newsgroup "sci.stat.math" will get you a better response.

R.G. Vickson

>
> I think this question is related to the original question. Another way to
> put it is, "If the value has apparently converged, what is the probability
> that the actual value is different than the value it seemed to converge
> on?".
>
> Hope this make sense.
>
> However, your analysis of the "unfair coin toss" is exactly what I was
> looking for and gives me great guidance. That model (p != 0.5) precisely
> covers blackjack play.
>
> Thanks, Datesfat

|
Pages: 1
Prev: What Else Can I Do With Modulus?
Next: Unique Prime Factorization theorem is also in the Indirect method #615 Correcting Math