From: Prophet Steering on 28 Mar 2010 01:23 In Section 3, this problem is generalized to tbe one where the gambler does not persist in the very largest, but rather feels satisfaction provided that his fortune upon stopping is within $m$ units from the largest. More specifically, when the allowance measure is $m$ $($ $m$ is a given positive integer), the gambler seeks an optimal stopping time $\sigma_{m}^{*}$ such that $P_{r} \{X_{\sigma_{m}^{*}}\geq 0\leq n\leq \mathrm{m}\mathrm{a} \mathrm{x}TX_{n}-(m-1)|X_{0}=i\}=\max_{\sigma\in C}P_{r}\{x_{\sigma} \geq 0\leq n\leq T\mathrm{m}\mathrm{a}\mathrm{x}X_{n}-(m-1)|X_{0}=i\} $ . $(1.3)$ Obviously when $m=1$ , this problem is reduced to the one considered in Section 2. It can be shown that there exists an optimal stopping time $\sigma_{m}^{*} $ of the following form; $\sigma_{m}^{*}=\min\{n:X_{n}=0\leq j\leq n\mathrm{m}\mathrm{a} \mathrm{x}X_{j}-(m-1)$ , $X_{n}<k_{m}^{*}\}$ , where $k_{m}^{*}$ is a positive integer defined as $k_{m}^{*}= \min\{k$ : $\frac{1-\theta^{k}}{1-\theta^{Nm+1}-}\geq \frac{\theta^{k}(1-\theta^{m})}{1-\theta^{k+m}}\}$ . 2 Stopping on the largest Suppose that we enter a casino having an initial fortune of $x$ units and bet one unit each time. Then we either win one unit with probability $p$ or lose one unit with probability $q=1-p$, where $0<p<1$. The gambling process $\{X_{n};n\geq 0\}$ , a Markov chain, continues until it reaches its absorbing states $0$ or $N,$ $\mathrm{b} \dot{\mathrm{u}}\mathrm{t}$ we can stop the process before absorption, if we like. If we decide to stop at time $n$ , then we are said to succeed if $X_{n}= \max_{0}\leq j\leq TXj$ . The objective of the problem is to find a stopping policy that will maximize the probability of success. Suppose that we have observed the values of $X_{0},$ $X_{1},$ $\cdots $ , $X_{n}$ . Then serious decision of either stop or continue takes place at time $n$ when $X_{n}$ is the maximum value observed so far, that is, $X_{n}= \max(x_{0}, x_{1,n}\ldots, x)$ . This decision epoch is simply referred to as state $x$ if $X_{n}=x$ , because, as a bit of consideration shows, the decision depends neither on the values of $X_{0},$ $X_{1},$ $\cdots$ , $X_{n-1}$ nor on the number of plays already made. Let $v(x)$ be the probability of success starting from state $x$ , and let $s(x)$ and $c(x)$ be respectively the probability of success when we stop in state $x$ and we continue gambling in an optimal manner in state $x$ ; then from the principle of optimality. $v(x)= \max\{S(X), C(X)\}$ , $0<x<N$ (2.1) where $s(x)=1-P_{x}(1, X)$ (2.2) $c(x)=pv(x+1)+q(1-s(x-1))v(x)$ , $0<x<N$ (2.3) with the boundary condition $v(\mathrm{O})=v(N)=1$ . (2.3) is immediate from (1.2). (2.4) follows because the next possible state visited (after leaving state $x $ ) is either $x+1$ or $x$ depending on whether we win one unit at the next play or lose one unit at the next 150 play but attain state $x$ again before going broke.To solve the optimality equation $(2.2)-$ (2.4), we invoke the result of positive dynamic programming (see, for detail, chapter IV of $\mathrm{R}\mathrm{o}\mathrm{s}\mathrm{S}[2])$ . The main result (Theorem 1.2, p.75 of Ross) is that if the problem fits the framework of the positive dynamic programming, then a given stationary policy is optimal $\mathrm{i}\mathrm{f}_{\ell}\mathrm{i}\mathrm{t}\mathrm{S}$ value function satisfies the optimality equation. This gives us a method of checking whether or not a guessed policy can be optimal and is particularly useful in cases in which there exists an obvious optimal policy. Our problem in fact fits the framework of positive dynamic programming with the state being the gamblers fortune, since if we suppose that a reward of 1 is earned if we attain the largest over all and all other rewards are zero, then the expected total reward equals the probability of success. Theorem 2.1 Let $f$ be a stationary policy which, when the decision process is in state $x$ , chooses to stop if and only if $x<x^{*}$ , where $x^{*}= \min\{x$ : $\frac{1-\theta^{x}}{1-\theta^{N}}\geq \frac{\theta^{x}-\theta^{x+1}}{1-\theta^{x+1}}\}$ . (2.4) Then $f$ is optimal. Proof Let $v_{f}(x)$ denote the probability of success starting from state $x$ when policy $f$ is employed. Once the decision process leaves state $x(\geq x^{*})$ , it never stops until absorption takes place under $f$ . Thus we soon obtain $v_{f}(x)=\{$ $s(x)$ , $\mathrm{i}\mathrm{f}x<x^{*}$ $P_{x}(N-x, x)$ , $\mathrm{i}\mathrm{f}x\geq x^{*}$ , (2.5) and it is easy to check that $s(x)$ is decreasing in $x$ , while $P_{x} (N-x, x)$ is increasing in $x$ and that $x^{*}$ can be described as $x^{*}= \min\{x : s(x)\leq P_{x}(N-x, x)\}$ . Therefore, to prove $f$ is optimal, it suffices to show that, from $(2.2)-(2.4),$ $v_{f}(x)$ given in (2.5) satisfies $v_{f}(x)= \max\{S(X),pvf(x+1)+q(1-S(x-1))vf(x)\}$ , $0<x<N$. (2.6) We now show this distinguishing among three cases. Case(i): $x<x^{*}-1$ Since $v_{f}(x)=s(x)$ , to show the validity of (2.6), it suffices to show that $v_{f}(x)\geq pv_{f}(x+1)+q(1-S(X-1))v_{f}(X)$ , or $s(x)\geq ps(X+1)+q(1-S(_{X1))_{S(}}-X)$ , 151 or equivalently $\{s(X)-S(X+1)\}+\theta s(x+1)s(x)\geq 0$ , which is immediate since $s(x)$ is non-negative and decreasing in $x $ . Case(ii): $x\geq x^{*}$ Since $v_{f}(x)=P_{x}(N-x, x)\geq s(x)$ , it suffices to show that $v_{f}(x)\geq pv_{f}(x+1)+q(1-S(x-1))vf(X)$ , or equivalently $P_{x}(N-X, x)\geq pP_{x+1}(N-X-1, x+1)+qP_{x-1}(1, x+1)P_{x}(N-x, x) $ , which in fact holds with equality, because. straigh.$\mathrm{t} \mathrm{f}_{0}\mathrm{r}\mathrm{W}\mathrm{a}\mathrm{r}\mathrm{d} \mathrm{c}\mathrm{a}1_{\mathrm{C}\mathrm{u}}1\mathrm{a}\mathrm{t} \mathrm{i}\mathrm{o}\mathrm{n}'$ . from (.1.2) leads to the following identity; $P_{x}(N-x, x)=pP_{x+1}(N-x-1, X+1)+qP_{x-1}(1, X-1)P(xN-X, x)$ . (2.7) It is easy to interpret this identity probabilistically by conditioning on the result of the first play after leaving state $x$ . Case(iii): $x=x^{*}-1$ Since $v_{f}(x^{*}-1)=s(x^{*}-1)$ , it suffices to show that $v_{f}(x^{*}-1)\geq pv_{f}(x^{*})+q(1-S(X-*2))v_{f(-1}x*)$, or $s(x^{*}-1)\geq pP_{x^{*}}(N-x^{*}, x^{*})+qP_{x^{*}-2}(1, X^{*}-2)s(X^{*}-1)$ , or $s(x^{*}-1) \geq\frac{pP_{x^{\wedge()}}N-XX*,*}{1-qP_{x^{*}-} 2(1,x^{*}-2)}$ , or equivalently from (2.7) $s(x^{*}-1)\geq P_{x^{\mathrm{r}}-1}(N-x^{*}+1, x^{*}-1)$ , which is obvious from the definition of $x^{*}$ . Thus the proof is complete. 3 Stopping on one of $m$ largest In this section, we are concerned- with stopping on one of $m$ largest of the gambling process. That is, we wish to find an optimal stopping time $\sigma_{m} ^{*}$ which satisfies (1.3). Suppose that we have observed the values of $X_{0},$ $X_{1}$ ) $\ldots,$ $x_{n}$ . Then serious decision of either stop or continue takes place at time $n$ if $X_{n} \geq \max_{0\leq j\leq}X_{j}n-(m-1)$ . We denote this state by $(x, i)$ if $X_{n}=x$ and $X_{n}= \max_{0\leq j \leq}nX_{j}-(i-1),$ $0<x<N-$ $m,$ $i=1,2,$ $\ldots,$ $m$ . Our trial is now regarded as successful if we stop at time $n$ and 152 $X_{n} \geq\max_{0\leq j\leq\tau}X_{j}-(m-1)$.Let $v_{i}(X),$ $1\leq i \leq m$, be the probability of success starting from state $(x, i)$ , and let $s_{i}(x)$ and $c_{i}(x)$ be respectively the probability of successes when we decide to stop and when we decide to continue in an optimal manner in state $(x, i)$ , then from the principle of optimality, $v_{i}(x)= \max\{s_{i}(x), C_{i}(X)\}$ , $1\leq i\leq m$ , $0<x<N-m$ (3.1) where $s_{i}(x)\equiv s(x)=1-P_{x}(m,..x)$ $1\leq Ai\leq m$ (3.2) and $c_{1}(x)=pv_{1}(x+1)+qv_{2}(x-\mathrm{i})$ (3.3) $c_{i}(x)=pv_{i-1}(x+1)+qv_{i+1}(x-1)$ , $1<i<m$ (3.4) $c_{m}=pv_{m-1}(X+1)+qP_{x-1}(1, X-1)vm(x)$ . (3.5) Assume that we choose to be continue in state $(x, i),$ $1\leq i<m$ , if $s_{i}(x)=c_{i}(x)$ . Then as the next lemma shows, we never stop in state $(x, i),$ $1\leq i<m$ . Lemma 3.1 For $1\leq i<m$ , $c_{i}(x)\geq s_{i}(X)$ . Proof. Since, from the definition, $c_{i}(x)\geq ps(x+1)+qS(_{X}-1)$ , $1\leq i<m$ , to show $c_{i}(x)\geq s_{i}(x)$ , it suffices to show that ps $(_{X+}1)+qs(x-1)\geq s(x)$ , or $\frac{1-\theta^{x}}{1-\theta^{x+m}}-\{p\frac{1-\theta^{x+1}}{1- \theta^{x+}m+1}+q\frac{1-\theta^{x-1}}{1-\theta^{x}+m-1}\}\geq 0$ . (3.6) Because $p=1/(1+\theta),$ $q=\theta/(1+\theta)$ , after a bit of calculation, the left-hand side of (3.6) becomes $\theta^{2x+m-}1(\frac{1-\theta^{m}}{1-\theta})$ .. $\overline{(\frac{1-\theta^{x}+m-1}{1-\theta})(\frac{1-\theta^{x+m}}{1- \theta})(\frac{1-\theta^{x+}m+1}{1-\theta}\mathrm{I}}$ which is obviously non-negative and proves (3.6). $\mathrm{t}\backslash $ : i. $\cdot$ - $\dot{\mathit{9}}$ . .. . .,$\cdot$ $\cdot$ . From Lemma 3.1, we are only concerned with $\mathrm{t}\mathrm{h} \dot{\mathrm{e}}$ optimal decision in state $(x, m).$ The following lemma expresses $c_{m}(x)$ in terms of $v_{m}(.)$ . Lemma 3.2 Define $P=(1-\theta^{m-1})/(1-\theta^{m})$ . Then, for $x \leq N-2m$ , 153 $c_{m}(x)$ $=$ $\{p\theta P+qP_{x-1}(1, x-1)\}vm(x)$ $+p(1- \theta P)\sum^{N}y=x+1-2m+1P^{y-x-1}(1-P)v_{m}(y)$ $+p(1-\theta P)P^{N-}x-2m-1$ . Proof. Let $p(x, y)$ be the transition probability from state $(x, m)$ to state $(y, m),$ $y\geq x$ . Then straightforward calculation yields $p(x, y)-.$ $=$ $\{$ $qP_{x-1}(1, X-1)+p[1-P_{x+1}(m-1,1)]$ , if $y=x$ $pP_{x+1}(m-1,1)[\Pi_{i=0}^{yx}--2Px+m+i(1, m-1)][1-P_{y+}m-1(1, m-1)] $ , if $x<y\leq N-2m+1$ $=$ $\{$ $qP_{x-1}(1, X-1)+p\theta P$, if $y=x$ $p(1-\theta P)P^{y}-x-1(1-P)$ , if $x<y\leq N-2m+1$ . The remaining probability $1-\Sigma_{y=}^{N-2}xp(_{X}m+1, y)=p(1-\theta P)P^{N-}x-2m+1$ corresponds to the probability that the gambling process $\{X_{n}\}$ attains its absorbing state $N-m+1$. Thus the proof is complete. We now have the following identity. Lemma 3.3 For $m\geq 1$ and $x\leq N-2m$, $P_{x}(N-m+1-x, x)$ $=$ $\{p\theta P+qP_{x-1}(1, X-1)\}P_{x}(N-m+1-x, x)$ $+p(1- \theta P)\sum^{N}y=x+1(-2m+1P^{y-}x-11-P)P_{y}(N-m+1-y, y)$ $+p(1-\theta P)P^{N-2}-xm-1$ . Proof. We can prove this by straightforward calculation from (1.2), but this identity is quite intuitive from Lemma 3.2. We are now ready to prove the main result. Theorem 3.4 Let $f$ be a stationary policy which, when the decision process is in state $(x, m)$ , chooses to stop if and only if $x<x^{*}$ , where $x^{*}= \min\{x$ : $\frac{1-\theta^{x}}{1-\theta^{Nm+1}-}\geq \frac{\theta^{x}-\theta^{x+m}}{1-\theta^{x+m}}\}$ . 154 Then $f$ is optimal. Proof Let $v_{f}(x)$ denote the probability of success starting from state $(x, m)$ when policy $f$ is employed. Once the decision process leaves state $(x, m)$ for $x \geq x^{*}$ , it never stops until absorption takes place under $f$ . Thus we have $v_{f}(x)=\{$ $s(x)$ , $\mathrm{i}\mathrm{f}x<x^{*}$ $P_{x}(N-m+1-x, x)$ , $\mathrm{i}\mathrm{f}x\geq x^{*}$ , (3.7) and it is easy to check that $s(x)$ is decreasing in $x$ , while $P_{x} (N-m+1-X, x. )$ is increasing in $x$ and that $x^{*}$ can be described as $x^{*}= \min\{x : s(X)\leq P_{x}(N-m+1-x, x)\}$ . Therefore, to prove $f$ is optimal, it suffices to show that $v_{[}(x) $ given in (3.7) satisfies $v_{f}(x)= \max\{s(X), cf(x)\}$, $0<x<N-m+1$, where $c_{f}(x)$ $=$ $\{p\theta P+qP_{x-1}(1, X-1)\}vf(X)$ $+p(1- \theta P)\sum y=x+1^{+}(N-2m1P^{y1}-x-1-P)vf(y)$ $+p(1-\theta P)P^{N-}x-2m-1$ We can show this in a similar way as in Theorem 2.1, appealing to Lemma 3.3. Finally we give $V(x)$ , success probability starting from $\mathrm{x} $ units of fortune. Lemma 3.5 Let $P$ be as defined in Lemma 3.2. Then, for $m-1\leq x\leq N-m$, $V(x)= \sum^{N-2}-m+1{}_{r=x}Pm+1y-x+m-1(1-P)vf(y)+P^{N-m-x}+1$ . Proof This can be derived easily by conditioning on the first state $ (y, m)$ visited. References [1] Ross, S. M., Dynamic programming and gambling models, adv. Appl. Probab. 6, 593-606, 1974. [2] Ross, S. M., Introduction to Stochastic Dynamic Programming. Academic Press, New York, 1983. 155
|
Pages: 1 Prev: #JVXL VERSION 1.4 Next: The Numbers Surrounding Unsolved Problems May Be Simple and Elegant |