From: Ryan on 9 Jan 2010 21:23 On Jan 9, 7:25 pm, stringplaye...(a)YAHOO.COM (Dale McLerran) wrote: > --- On Sat, 1/9/10, Ryan <ryan.andrew.bl...(a)GMAIL.COM> wrote: > > > > > > > From: Ryan <ryan.andrew.bl...(a)GMAIL.COM> > > Subject: Re: Fisher's exact test appropriate here? > > To: SA...(a)LISTSERV.UGA.EDU > > Date: Saturday, January 9, 2010, 5:29 AM > > On Jan 8, 7:51 pm, stringplaye...(a)YAHOO.COM > > (Dale McLerran) wrote: > > > > Robert, > > > > Let me dispose of the statement about whether Fisher's exact > > > test is using an approximation due to the large sample size. > > > For a 2x2 table with row totals R1 and R2, column totals C1 > > > and C2, and cell frequencies f11, f12, f21, and f22, the > > > Fisher exact test depends on the computation > > > > P = [ ( R1! * R2! * C1! * C2!) / n! ] / > > > ( f11! * f12! * f21! * f22!) > > > > for different arrangements of the cell frequencies fij. > > > Now, SAS certainly cannot compute all of these factorials > > > for the size sample which you have. Note, that > > > > X! = GAMMA(X+1) > > > > where GAMMA(u) is the gamma function and that > > > > log(X!) = lGAMMA(X+1) > > > > where lGAMMA is the log gamma function. Now, taking > > > logarithms, we have > > > > log(P) = log( [ ( R1! * R2! * C1! * C2!) / n! ] / > > > ( f11! * f12! * f21! * f22!) ) > > > > = log(R1!) + log(R2!) + log(C1!) + log(C2!) - log(n!) - > > > log(f11!) - log(f12!) - log(f21!) - log(f22!) > > > > = lgamma(R1+1) + lgamma(R2+1) + lgamma(C1+1) + lgamma(C2+1) - > > > lgamma(n+1) - > > > lgamma(f11+1) - lgamma(f12+1) - lgamma(f21+1) -lgamma(f22+1) > > > > Now, what is really important for Fisher's exact test is > > > not the value of P (or log(P)), but the value of P (log(P)) > > > for the observed table compared to other possible tables > > > which retain the same marginal frequencies. To the extent > > > that the computation of log(P) using the lgamma function > > > retains order, then the computation of Fisher's exact test > > > is not at all affected by the sample size. I would really > > > expect that log(P) would at least retain order across all > > > possible tables which have the specified marginal values. > > > Thus, the value of the Fisher exact test should not be > > > compromised at all. > > > > Now, as to whether the Fisher exact p-value is better than > > > p-values based on distributional assumptions (normal, Poisson), > > > I would think that it wouldn't much matter for the sample size > > > that you have here. Certainly, for the values which you > > > present in your post, the Fisher exact test, chi-square test, > > > and Poisson model all produce nonsignificant p-values. There > > > is some discrepancy in p-values for the three methods. > > > However, since all of the methods indicate that the p-value > > > is greater than 0.50, any discrepancy is of trivial > > > importance. > > > > You might have some other variables which you want to test > > > (or you might not have revealed correct data). As you get > > > closer to p=0.05, I would place money that the p-values > > > will become more and more similar. If you are in the > > > uncomfortable position of having one test where p<0.05 > > > and another test where p>0.05, the interpretation is not > > > really any different. Using p<0.05 is a rather arbitrary > > > choice. > > > > Dale > > > > --------------------------------------- > > > Dale McLerran > > > Fred Hutchinson Cancer Research Center > > > mailto: dmclerra(a)NO_SPAMfhcrc.org > > > Ph: (206) 667-2926 > > > Fax: (206) 667-5977 > > > ---------------------------------------- Hide quoted > > text - > > > > - Show quoted text - > > > Dale, > > > I computed P*log(P) for a 2X2 table with the following > > cell > > frequencies: > > > f11=8 > > f12=14 > > f21=75 > > f22=32 > > > and obtained the value > > > P*log(P)=-0.015585312549042 > > > Here are the formulas I used to compute P*log(P) in another > > stats > > program: > > > --------------- > > > log_p = lngamma(22+1) + lngamma(107+1) + lngamma(83+1) + > > lngamma(46+1) - lngamma(129+1) - > > lngamma(8+1) - lngamma(14+1) - lngamma(75+1) - > > lngamma(32+1) > > > p_log_p = exp(log_p)*log_p > > > -------------- > > > If you have the time, could you please tell me what I did incorrectly > > and exactly what p_log_p represents? > > > Thanks, > > > Ryan > > Ryan, > > When I wrote "but the value of P (log(P))", you apparently > interpreted that to mean that we would compute P*log(P). > Previously in the same sentence, I had written "what is > really important for Fisher's exact test is not the value > of P (or log(P))". So, when I wrote "P (log(P))" in the > same sentence, I meant for that to be interpreted as > "P (or log(P))". But I was not really clear on that. > > Now, this P that we compute according to the formula specified > above is NOT the Fisher's exact test p-value. Rather, it > is a probability under multinomial sampling of the > particular table with observed f11, f12, f21, and f22 > among all tables which have the observed marginal frequencies > R1, R2, C1, and C2. Given those same fixed marginals, there > could be other values f11~, f12~, f21~, and f22~ which > could have been observed. > > The Fisher's exact test p-value is obtained by computing P > for the observed table as well as P~ (where P~ is the value > of P computed for f11~, f12~, f21~, and f22~) for all possible > tables having the observed marginal frequencies. We then > compare P against the distribution of P~. > > Note, though, that P and log(P) have a monotonic relationship > so that we could also compare log(P) against the distribution > of log(P~). For that matter, P*log(P) and P have a > monotonic relationship. So, the statistic which you have > computed above could be employed to construct the Fisher's > exact test p-value if you compare the observed table value > of P*log(P) against the distribution of P~*log(P~). > > Dale > > --------------------------------------- > Dale McLerran > Fred Hutchinson Cancer Research Center > mailto: dmclerra(a)NO_SPAMfhcrc.org > Ph: (206) 667-2926 > Fax: (206) 667-5977 > ---------------------------------------- Hide quoted text - > > - Show quoted text - Dale, Thank you for the clarification. I found a simple example online demonstrating how to calculate Fisher's exact one tailed p value using the factorial formula. Out of interest I decided to solve for Fisher's exact one tailed p value using the formula with the lgamma function you presented... The observed frequencies table is: ----- 7 2 5 6 ----- The two stronger tables with the same marginal frequencies are: ------ 8 1 4 7 ----- and ----- 9 0 3 8 ----- I solved for Fisher's exact one tailed p-value using the following equations: ----- log_p_obs = lngamma(9+1) + lngamma(11+1) + lngamma(12+1) + lngamma (8+1) - lngamma(20+1) - lngamma(7+1) - lngamma(2+1) - lngamma(5+1) - lngamma(6+1) p_obs = exp(log_p_obs) ----- log_p_alt1 = lngamma(9+1) + lngamma(11+1) + lngamma(12+1) + lngamma (8+1) - lngamma(20+1) - lngamma(8+1) - lngamma(1+1) - lngamma(4+1) - lngamma(7+1) p_alt1 = exp(log_p_alt1) ----- log_p_alt2 = lngamma(9+1) + lngamma(11+1) + lngamma(12+1) + lngamma (8+1) - lngamma(20+1) - lngamma(9+1) - lngamma(0+1) - lngamma(3+1) - lngamma(8+1) p_alt2 = exp(log_p_alt2) ----- Fisher_onetailed_p = p_obs + p_alt1 + p_alt2 = 0.157 ----- I am fairly certain my calculations are correct. This was an interesting exercise. Thanks again, Dale. Ryan
From: xlr82sas on 9 Jan 2010 21:27 On Jan 9, 4:25 pm, stringplaye...(a)YAHOO.COM (Dale McLerran) wrote: > --- On Sat, 1/9/10, Ryan <ryan.andrew.bl...(a)GMAIL.COM> wrote: > > > > > > > From: Ryan <ryan.andrew.bl...(a)GMAIL.COM> > > Subject: Re: Fisher's exact test appropriate here? > > To: SA...(a)LISTSERV.UGA.EDU > > Date: Saturday, January 9, 2010, 5:29 AM > > On Jan 8, 7:51 pm, stringplaye...(a)YAHOO.COM > > (Dale McLerran) wrote: > > > > Robert, > > > > Let me dispose of the statement about whether Fisher's exact > > > test is using an approximation due to the large sample size. > > > For a 2x2 table with row totals R1 and R2, column totals C1 > > > and C2, and cell frequencies f11, f12, f21, and f22, the > > > Fisher exact test depends on the computation > > > > P = [ ( R1! * R2! * C1! * C2!) / n! ] / > > > ( f11! * f12! * f21! * f22!) > > > > for different arrangements of the cell frequencies fij. > > > Now, SAS certainly cannot compute all of these factorials > > > for the size sample which you have. Note, that > > > > X! = GAMMA(X+1) > > > > where GAMMA(u) is the gamma function and that > > > > log(X!) = lGAMMA(X+1) > > > > where lGAMMA is the log gamma function. Now, taking > > > logarithms, we have > > > > log(P) = log( [ ( R1! * R2! * C1! * C2!) / n! ] / > > > ( f11! * f12! * f21! * f22!) ) > > > > = log(R1!) + log(R2!) + log(C1!) + log(C2!) - log(n!) - > > > log(f11!) - log(f12!) - log(f21!) - log(f22!) > > > > = lgamma(R1+1) + lgamma(R2+1) + lgamma(C1+1) + lgamma(C2+1) - > > > lgamma(n+1) - > > > lgamma(f11+1) - lgamma(f12+1) - lgamma(f21+1) -lgamma(f22+1) > > > > Now, what is really important for Fisher's exact test is > > > not the value of P (or log(P)), but the value of P (log(P)) > > > for the observed table compared to other possible tables > > > which retain the same marginal frequencies. To the extent > > > that the computation of log(P) using the lgamma function > > > retains order, then the computation of Fisher's exact test > > > is not at all affected by the sample size. I would really > > > expect that log(P) would at least retain order across all > > > possible tables which have the specified marginal values. > > > Thus, the value of the Fisher exact test should not be > > > compromised at all. > > > > Now, as to whether the Fisher exact p-value is better than > > > p-values based on distributional assumptions (normal, Poisson), > > > I would think that it wouldn't much matter for the sample size > > > that you have here. Certainly, for the values which you > > > present in your post, the Fisher exact test, chi-square test, > > > and Poisson model all produce nonsignificant p-values. There > > > is some discrepancy in p-values for the three methods. > > > However, since all of the methods indicate that the p-value > > > is greater than 0.50, any discrepancy is of trivial > > > importance. > > > > You might have some other variables which you want to test > > > (or you might not have revealed correct data). As you get > > > closer to p=0.05, I would place money that the p-values > > > will become more and more similar. If you are in the > > > uncomfortable position of having one test where p<0.05 > > > and another test where p>0.05, the interpretation is not > > > really any different. Using p<0.05 is a rather arbitrary > > > choice. > > > > Dale > > > > --------------------------------------- > > > Dale McLerran > > > Fred Hutchinson Cancer Research Center > > > mailto: dmclerra(a)NO_SPAMfhcrc.org > > > Ph: (206) 667-2926 > > > Fax: (206) 667-5977 > > > ---------------------------------------- Hide quoted > > text - > > > > - Show quoted text - > > > Dale, > > > I computed P*log(P) for a 2X2 table with the following > > cell > > frequencies: > > > f11=8 > > f12=14 > > f21=75 > > f22=32 > > > and obtained the value > > > P*log(P)=-0.015585312549042 > > > Here are the formulas I used to compute P*log(P) in another > > stats > > program: > > > --------------- > > > log_p = lngamma(22+1) + lngamma(107+1) + lngamma(83+1) + > > lngamma(46+1) - lngamma(129+1) - > > lngamma(8+1) - lngamma(14+1) - lngamma(75+1) - > > lngamma(32+1) > > > p_log_p = exp(log_p)*log_p > > > -------------- > > > If you have the time, could you please tell me what I did incorrectly > > and exactly what p_log_p represents? > > > Thanks, > > > Ryan > > Ryan, > > When I wrote "but the value of P (log(P))", you apparently > interpreted that to mean that we would compute P*log(P). > Previously in the same sentence, I had written "what is > really important for Fisher's exact test is not the value > of P (or log(P))". So, when I wrote "P (log(P))" in the > same sentence, I meant for that to be interpreted as > "P (or log(P))". But I was not really clear on that. > > Now, this P that we compute according to the formula specified > above is NOT the Fisher's exact test p-value. Rather, it > is a probability under multinomial sampling of the > particular table with observed f11, f12, f21, and f22 > among all tables which have the observed marginal frequencies > R1, R2, C1, and C2. Given those same fixed marginals, there > could be other values f11~, f12~, f21~, and f22~ which > could have been observed. > > The Fisher's exact test p-value is obtained by computing P > for the observed table as well as P~ (where P~ is the value > of P computed for f11~, f12~, f21~, and f22~) for all possible > tables having the observed marginal frequencies. We then > compare P against the distribution of P~. > > Note, though, that P and log(P) have a monotonic relationship > so that we could also compare log(P) against the distribution > of log(P~). For that matter, P*log(P) and P have a > monotonic relationship. So, the statistic which you have > computed above could be employed to construct the Fisher's > exact test p-value if you compare the observed table value > of P*log(P) against the distribution of P~*log(P~). > > Dale > > --------------------------------------- > Dale McLerran > Fred Hutchinson Cancer Research Center > mailto: dmclerra(a)NO_SPAMfhcrc.org > Ph: (206) 667-2926 > Fax: (206) 667-5977 > ---------------------------------------- Hide quoted text - > > - Show quoted text - Hi Dale, see http://en.wikipedia.org/wiki/Fisher%27s_exact_test If I use your formula I get a p-value 0.0026221258207619 If I use proc freq for the two tailed p-value of the Fishers Exact Test I get 0.0026221258207619 I think the Exact P-value for a 2x2 table is just a statistic of the hypergeometric distribution. Your P is just a result of evaluating the hypergeometric distribution. Consider 2x2 table a b c d p=(a + b)!(c+d)!(a + c)!(b + d)! / n!a!b!c!d! The lgamma makes it easy to evaluate the factorials. Since a! = gamma(a +1) we have all the plus ones.
From: Dale McLerran on 11 Jan 2010 01:37
--- On Sat, 1/9/10, xlr82sas <xlr82sas(a)AOL.COM> wrote: > From: xlr82sas <xlr82sas(a)AOL.COM> > Subject: Re: Fisher's exact test appropriate here? > To: SAS-L(a)LISTSERV.UGA.EDU > Date: Saturday, January 9, 2010, 6:27 PM > On Jan 9, 4:25 pm, stringplaye...(a)YAHOO.COM > (Dale McLerran) wrote: > > --- On Sat, 1/9/10, Ryan <ryan.andrew.bl...(a)GMAIL.COM> > wrote: > > Hi Dale, > > see > http://en.wikipedia.org/wiki/Fisher%27s_exact_test > > If I use your formula I get a p-value > > 0.0026221258207619 > > If I use proc freq for the two tailed p-value of the > Fishers Exact > Test I get > > 0.0026221258207619 > > I think the Exact P-value for a 2x2 table is just a statistic of the > hypergeometric distribution. Your P is just a result of evaluating the > hypergeometric distribution. > > Consider 2x2 table > > a b > c d > > p=(a + b)!(c+d)!(a + c)!(b + d)! / n!a!b!c!d! > > The lgamma makes it easy to evaluate the factorials. Since a! = gamma(a > +1) we have all the plus ones. > Just to be clear, the value 0.0026... is the probability of the observed table under the hypergeometric distribution. However, the (two-tailed) p-value for Fisher's exact test for the data which were given as 8 14 75 32 is p=0.0060... The following code uses PROC FREQ to evaluate table probabilities for all 2x2 tables which have the following structure: f11 f12 22 f21 f22 107 83 46 129 If we add the table probabilities for all tables which have f11<=8, then we get the "Left-sided Pr <= F" value which is presented for the table with f11=8. Adding up the table probabilities which are at least as extreme as the observed table probability, we obtain the value shown in the row "Two-sided Pr <= P". data test; do i=0 to 22; f11 = i; f12=22-i; f21=83-i; f22=129-f11-f12-f21; x=0; y=0; freq=f11; output; x=0; y=1; freq=f12; output; x=1; y=0; freq=f21; output; x=1; y=1; freq=f22; output; end; keep i x y freq; run; proc freq data=test(where=(i=8)); weight freq; tables x*y / chisq; run; ods listing close; ods output FishersExact=Fisher(where=(name1="P_TABLE")); proc freq data=test; by i; weight freq; tables x*y / chisq; run; data _null_; set Fisher(where=(i=8) rename=(nvalue1=P_observed)); do j=0 to 22; pointer=j+1; set Fisher point=pointer; if j<=8 then left+nvalue1; if nvalue1<=P_observed then pval_2tailed+nvalue1; end; put left= pval_2tailed=; run; Dale --------------------------------------- Dale McLerran Fred Hutchinson Cancer Research Center mailto: dmclerra(a)NO_SPAMfhcrc.org Ph: (206) 667-2926 Fax: (206) 667-5977 --------------------------------------- |