From: Eli Y. Kling on 8 Jan 2010 18:38 I feel Fishers Exact test is appropriate but probably not powerful enough. How about a logistic regression with extra explanatory variables such as economic-demographic group. If you can lay your hands on detailed records you might use the data mining technique of balanced sampling (50% event and 50% non event) to deal with the rear event modelling. But that aside, I wonder whether even if the difference is statistically significant whether it is practically significant. You can turn the question on its head: out of the (17+26)=43, 17/43=39.5% have Medicaid. For H0: Complications explain Medicaid the one-tail P value is 0.1110 and the two-tail P value is 0.2221 In social sciences that might be considered significant but you have to decide. With Regard, Eli On 8 Jan, 20:57, robe...(a)HEALTH.OK.GOV (Robert Feyerharm) wrote: > I'm comparing various pregnancy & delivery complication rates between the > Medicaid and Non-Medicaid populations in my State. These rates are often > quite small (for example, 17/20,833 vs 26/26,602). > > There are a number of options available to test for a statistically > significant difference between two rates. I'm inclined to use Fisher's > exact test in this situation since it makes no assumptions how the data is > distributed (normal, Poisson, etc.). > > Most epidemiologists use a Poisson approximation to compare rates where > the numerator is less than 100. Is Fisher's exact test a better method? > > Note that for cases like mine where the denominator is large, SAS probably > resorts to a numerical method to approximate the Fisher's exact test p- > value (hence the Fisher's exact test p-value may not exactly be "exact"). > > Thanks, > > Robert Feyerharm > Oklahoma State Department of Health
From: Dale McLerran on 8 Jan 2010 19:51 --- On Fri, 1/8/10, Robert Feyerharm <robertf(a)HEALTH.OK.GOV> wrote: > From: Robert Feyerharm <robertf(a)HEALTH.OK.GOV> > Subject: Fisher's exact test appropriate here? > To: SAS-L(a)LISTSERV.UGA.EDU > Date: Friday, January 8, 2010, 12:57 PM > I'm comparing various pregnancy & delivery complication rates between the > Medicaid and Non-Medicaid populations in my State. These rates are often > quite small (for example, 17/20,833 vs 26/26,602). > > There are a number of options available to test for a statistically > significant difference between two rates. I'm inclined to use Fisher's > exact test in this situation since it makes no assumptions how the data > is distributed (normal, Poisson, etc.). > > Most epidemiologists use a Poisson approximation to compare rates where > the numerator is less than 100. Is Fisher's exact test a better method? > > Note that for cases like mine where the denominator is large, SAS > probably resorts to a numerical method to approximate the Fisher's > exact test p-value (hence the Fisher's exact test p-value may not > exactly be "exact"). > > > Thanks, > > Robert Feyerharm > Oklahoma State Department of Health > Robert, Let me dispose of the statement about whether Fisher's exact test is using an approximation due to the large sample size. For a 2x2 table with row totals R1 and R2, column totals C1 and C2, and cell frequencies f11, f12, f21, and f22, the Fisher exact test depends on the computation P = [ ( R1! * R2! * C1! * C2!) / n! ] / ( f11! * f12! * f21! * f22!) for different arrangements of the cell frequencies fij. Now, SAS certainly cannot compute all of these factorials for the size sample which you have. Note, that X! = GAMMA(X+1) where GAMMA(u) is the gamma function and that log(X!) = lGAMMA(X+1) where lGAMMA is the log gamma function. Now, taking logarithms, we have log(P) = log( [ ( R1! * R2! * C1! * C2!) / n! ] / ( f11! * f12! * f21! * f22!) ) = log(R1!) + log(R2!) + log(C1!) + log(C2!) - log(n!) - log(f11!) - log(f12!) - log(f21!) - log(f22!) = lgamma(R1+1) + lgamma(R2+1) + lgamma(C1+1) + lgamma(C2+1) - lgamma(n+1) - lgamma(f11+1) - lgamma(f12+1) - lgamma(f21+1) - lgamma(f22+1) Now, what is really important for Fisher's exact test is not the value of P (or log(P)), but the value of P (log(P)) for the observed table compared to other possible tables which retain the same marginal frequencies. To the extent that the computation of log(P) using the lgamma function retains order, then the computation of Fisher's exact test is not at all affected by the sample size. I would really expect that log(P) would at least retain order across all possible tables which have the specified marginal values. Thus, the value of the Fisher exact test should not be compromised at all. Now, as to whether the Fisher exact p-value is better than p-values based on distributional assumptions (normal, Poisson), I would think that it wouldn't much matter for the sample size that you have here. Certainly, for the values which you present in your post, the Fisher exact test, chi-square test, and Poisson model all produce nonsignificant p-values. There is some discrepancy in p-values for the three methods. However, since all of the methods indicate that the p-value is greater than 0.50, any discrepancy is of trivial importance. You might have some other variables which you want to test (or you might not have revealed correct data). As you get closer to p=0.05, I would place money that the p-values will become more and more similar. If you are in the uncomfortable position of having one test where p<0.05 and another test where p>0.05, the interpretation is not really any different. Using p<0.05 is a rather arbitrary choice. Dale --------------------------------------- Dale McLerran Fred Hutchinson Cancer Research Center mailto: dmclerra(a)NO_SPAMfhcrc.org Ph: (206) 667-2926 Fax: (206) 667-5977 ---------------------------------------
From: Ryan on 9 Jan 2010 08:29 On Jan 8, 7:51 pm, stringplaye...(a)YAHOO.COM (Dale McLerran) wrote: > --- On Fri, 1/8/10, Robert Feyerharm <robe...(a)HEALTH.OK.GOV> wrote: > > > > > > > From: Robert Feyerharm <robe...(a)HEALTH.OK.GOV> > > Subject: Fisher's exact test appropriate here? > > To: SA...(a)LISTSERV.UGA.EDU > > Date: Friday, January 8, 2010, 12:57 PM > > I'm comparing various pregnancy & delivery complication rates between the > > Medicaid and Non-Medicaid populations in my State. These rates are often > > quite small (for example, 17/20,833 vs 26/26,602). > > > There are a number of options available to test for a statistically > > significant difference between two rates. I'm inclined to use Fisher's > > exact test in this situation since it makes no assumptions how the data > > is distributed (normal, Poisson, etc.). > > > Most epidemiologists use a Poisson approximation to compare rates where > > the numerator is less than 100. Is Fisher's exact test a better method? > > > Note that for cases like mine where the denominator is large, SAS > > probably resorts to a numerical method to approximate the Fisher's > > exact test p-value (hence the Fisher's exact test p-value may not > > exactly be "exact"). > > > Thanks, > > > Robert Feyerharm > > Oklahoma State Department of Health > > Robert, > > Let me dispose of the statement about whether Fisher's exact > test is using an approximation due to the large sample size. > For a 2x2 table with row totals R1 and R2, column totals C1 > and C2, and cell frequencies f11, f12, f21, and f22, the > Fisher exact test depends on the computation > > P = [ ( R1! * R2! * C1! * C2!) / n! ] / > ( f11! * f12! * f21! * f22!) > > for different arrangements of the cell frequencies fij. > Now, SAS certainly cannot compute all of these factorials > for the size sample which you have. Note, that > > X! = GAMMA(X+1) > > where GAMMA(u) is the gamma function and that > > log(X!) = lGAMMA(X+1) > > where lGAMMA is the log gamma function. Now, taking > logarithms, we have > > log(P) = log( [ ( R1! * R2! * C1! * C2!) / n! ] / > ( f11! * f12! * f21! * f22!) ) > > = log(R1!) + log(R2!) + log(C1!) + log(C2!) - log(n!) - > log(f11!) - log(f12!) - log(f21!) - log(f22!) > > = lgamma(R1+1) + lgamma(R2+1) + lgamma(C1+1) + lgamma(C2+1) - > lgamma(n+1) - > lgamma(f11+1) - lgamma(f12+1) - lgamma(f21+1) - lgamma(f22+1) > > Now, what is really important for Fisher's exact test is > not the value of P (or log(P)), but the value of P (log(P)) > for the observed table compared to other possible tables > which retain the same marginal frequencies. To the extent > that the computation of log(P) using the lgamma function > retains order, then the computation of Fisher's exact test > is not at all affected by the sample size. I would really > expect that log(P) would at least retain order across all > possible tables which have the specified marginal values. > Thus, the value of the Fisher exact test should not be > compromised at all. > > Now, as to whether the Fisher exact p-value is better than > p-values based on distributional assumptions (normal, Poisson), > I would think that it wouldn't much matter for the sample size > that you have here. Certainly, for the values which you > present in your post, the Fisher exact test, chi-square test, > and Poisson model all produce nonsignificant p-values. There > is some discrepancy in p-values for the three methods. > However, since all of the methods indicate that the p-value > is greater than 0.50, any discrepancy is of trivial > importance. > > You might have some other variables which you want to test > (or you might not have revealed correct data). As you get > closer to p=0.05, I would place money that the p-values > will become more and more similar. If you are in the > uncomfortable position of having one test where p<0.05 > and another test where p>0.05, the interpretation is not > really any different. Using p<0.05 is a rather arbitrary > choice. > > Dale > > --------------------------------------- > Dale McLerran > Fred Hutchinson Cancer Research Center > mailto: dmclerra(a)NO_SPAMfhcrc.org > Ph: (206) 667-2926 > Fax: (206) 667-5977 > ---------------------------------------- Hide quoted text - > > - Show quoted text - Dale, I computed P*log(P) for a 2X2 table with the following cell frequencies: f11=8 f12=14 f21=75 f22=32 and obtained the value P*log(P)=-0.015585312549042 Here are the formulas I used to compute P*log(P) in another stats program: --------------- log_p = lngamma(22+1) + lngamma(107+1) + lngamma(83+1) + lngamma (46+1) - lngamma(129+1) - lngamma(8+1) - lngamma(14+1) - lngamma(75+1) - lngamma (32+1) p_log_p = exp(log_p)*log_p -------------- If you have the time, could you please tell me what I did incorrectly and exactly what p_log_p represents? Thanks, Ryan
From: xlr82sas on 9 Jan 2010 18:51 On Jan 9, 5:29 am, Ryan <ryan.andrew.bl...(a)gmail.com> wrote: > On Jan 8, 7:51 pm, stringplaye...(a)YAHOO.COM (Dale McLerran) wrote: > > > > > > > --- On Fri, 1/8/10, Robert Feyerharm <robe...(a)HEALTH.OK.GOV> wrote: > > > > From: Robert Feyerharm <robe...(a)HEALTH.OK.GOV> > > > Subject: Fisher's exact test appropriate here? > > > To: SA...(a)LISTSERV.UGA.EDU > > > Date: Friday, January 8, 2010, 12:57 PM > > > I'm comparing various pregnancy & delivery complication rates between the > > > Medicaid and Non-Medicaid populations in my State. These rates are often > > > quite small (for example, 17/20,833 vs 26/26,602). > > > > There are a number of options available to test for a statistically > > > significant difference between two rates. I'm inclined to use Fisher's > > > exact test in this situation since it makes no assumptions how the data > > > is distributed (normal, Poisson, etc.). > > > > Most epidemiologists use a Poisson approximation to compare rates where > > > the numerator is less than 100. Is Fisher's exact test a better method? > > > > Note that for cases like mine where the denominator is large, SAS > > > probably resorts to a numerical method to approximate the Fisher's > > > exact test p-value (hence the Fisher's exact test p-value may not > > > exactly be "exact"). > > > > Thanks, > > > > Robert Feyerharm > > > Oklahoma State Department of Health > > > Robert, > > > Let me dispose of the statement about whether Fisher's exact > > test is using an approximation due to the large sample size. > > For a 2x2 table with row totals R1 and R2, column totals C1 > > and C2, and cell frequencies f11, f12, f21, and f22, the > > Fisher exact test depends on the computation > > > P = [ ( R1! * R2! * C1! * C2!) / n! ] / > > ( f11! * f12! * f21! * f22!) > > > for different arrangements of the cell frequencies fij. > > Now, SAS certainly cannot compute all of these factorials > > for the size sample which you have. Note, that > > > X! = GAMMA(X+1) > > > where GAMMA(u) is the gamma function and that > > > log(X!) = lGAMMA(X+1) > > > where lGAMMA is the log gamma function. Now, taking > > logarithms, we have > > > log(P) = log( [ ( R1! * R2! * C1! * C2!) / n! ] / > > ( f11! * f12! * f21! * f22!) ) > > > = log(R1!) + log(R2!) + log(C1!) + log(C2!) - log(n!) - > > log(f11!) - log(f12!) - log(f21!) - log(f22!) > > > = lgamma(R1+1) + lgamma(R2+1) + lgamma(C1+1) + lgamma(C2+1) - > > lgamma(n+1) - > > lgamma(f11+1) - lgamma(f12+1) - lgamma(f21+1) - lgamma(f22+1) > > > Now, what is really important for Fisher's exact test is > > not the value of P (or log(P)), but the value of P (log(P)) > > for the observed table compared to other possible tables > > which retain the same marginal frequencies. To the extent > > that the computation of log(P) using the lgamma function > > retains order, then the computation of Fisher's exact test > > is not at all affected by the sample size. I would really > > expect that log(P) would at least retain order across all > > possible tables which have the specified marginal values. > > Thus, the value of the Fisher exact test should not be > > compromised at all. > > > Now, as to whether the Fisher exact p-value is better than > > p-values based on distributional assumptions (normal, Poisson), > > I would think that it wouldn't much matter for the sample size > > that you have here. Certainly, for the values which you > > present in your post, the Fisher exact test, chi-square test, > > and Poisson model all produce nonsignificant p-values. There > > is some discrepancy in p-values for the three methods. > > However, since all of the methods indicate that the p-value > > is greater than 0.50, any discrepancy is of trivial > > importance. > > > You might have some other variables which you want to test > > (or you might not have revealed correct data). As you get > > closer to p=0.05, I would place money that the p-values > > will become more and more similar. If you are in the > > uncomfortable position of having one test where p<0.05 > > and another test where p>0.05, the interpretation is not > > really any different. Using p<0.05 is a rather arbitrary > > choice. > > > Dale > > > --------------------------------------- > > Dale McLerran > > Fred Hutchinson Cancer Research Center > > mailto: dmclerra(a)NO_SPAMfhcrc.org > > Ph: (206) 667-2926 > > Fax: (206) 667-5977 > > ---------------------------------------- Hide quoted text - > > > - Show quoted text - > > Dale, > > I computed P*log(P) for a 2X2 table with the following cell > frequencies: > > f11=8 > f12=14 > f21=75 > f22=32 > > and obtained the value > > P*log(P)=-0.015585312549042 > > Here are the formulas I used to compute P*log(P) in another stats > program: > > --------------- > > log_p = lngamma(22+1) + lngamma(107+1) + lngamma(83+1) + lngamma > (46+1) - lngamma(129+1) - > lngamma(8+1) - lngamma(14+1) - lngamma(75+1) - lngamma > (32+1) > > p_log_p = exp(log_p)*log_p > > -------------- > > If you have the time, could you please tell me what I did incorrectly > and exactly what p_log_p represents? > > Thanks, > > Ryan- Hide quoted text - > > - Show quoted text - Hi, Dales formula agrees exactly with proc freq and does represent the Two Tail 'Exact Fischer Test'. data _null_; f11=8 ; f12=14 ; f21=75 ; f22=32 ; n=f11+f12+f21+f22; r1=f11+f12; r2=f21+f22; c1=f11+f21; c2=f12+f22; logP = lgamma(R1+1) + lgamma(R2+1) + lgamma(C1+1) + lgamma(C2+1) - lgamma(n+1) - lgamma(f11+1) - lgamma(f12+1) - lgamma(f21+1) - lgamma(f22+1); P=exp(logp); put p=; prd=p*logp; put prd=; run; P=0.0026221258 PRD=-0.015585313 %macro sigcid(pegevn=4,pegtot=467,pboevn=0,pbotot=461); data sigcid; trt='pbo'; evn=1; tot=&pbotot - &pboevn;output; trt='pbo'; evn=0; tot=&pboevn ;output; trt='peg'; evn=1; tot=&pegtot - &pegevn;output; trt='peg'; evn=0; tot=&pegevn ;output; run; ods output FishersExact=pvalues; proc freq data=sigcid; tables trt*evn/ list chisq riskdiffc exact relrisk; weight tot; output out=rsk (keep=_rdif2_ l_rdif2 u_rdif2) chisq riskdiffc exact relrisk; run; data rsk_ci(keep=dif); merge rsk pvalues(firstobs=2 obs=2) pvalues(firstobs=5 obs=5 rename=nvalue1=Two); dif='Approx CI '!!put( _rdif2_*100, 7.3) || " (" || put (l_rdif2*100, 7.3) || ", " || put(u_rdif2*100, 7.3) || ") Peg>Pbo Exact Pvalue="!!put (nvalue1,9.5) !! ' Two Tail Pvalue=' !! put(Two,9.5); run; proc print data=rsk_ci; run; %mend sigcid; %sigcid(pegevn=75,pegtot=107,pboevn=8,pbotot=22); DIF Approx CI 33.730 ( 9.096, 58.363) Peg>Pbo Exact Pvalue= 0.00333 Two Tail Pvalue= 0.00262 The two tail value is exactly what Dale was computing. The 2 x 2 test is a combinatorial problem. Higher dimension cantigency tables are more problematic. By the way you need SAS 9.2 to get the exact confidence interval of the risk difference. There is a lot more on this on my site http://homepage.mac.com/magdelina/.Public/utl.html /* T000102 FISHER EXACT TESTS FOR CONTINGENCY TABLES /* T000103 DATASTEP INTERACTIVE METHOD FOR BINOMIAL CONFIDENCE INTERVAL WORKS WITH 0 RESPONDERS MAYO CLINIC /* T000111 APPROX CONFIDENCE INTERVAL ON RISK DIFFERENCES USING PROC FREQ - NEED 9.2 FOR EXACT CONFIDENCE INTERVALS /* T000112 APPROX CONFIDENCE INTERVAL AND EXACT P-VALUE RISK DIFFERENCES USING PROC FREQ - NEED 9.2 FOR EXACT CONFIDENCE INTERVALS /* T000113 APPROX CONFIDENCE INTERVAL AND EXACT P-VALUE RISK DIFFERENCES USING PROC FREQ - AND DALES SAS-L ANALYSIS /* T000117 EXACT CONFIDENCE INTERVALS USING THE EXACT OPTION IN SAS PROC FREQ DOES NOT WORK WITH 0 RRESPONDERS /* T000119 CALCULATE CONFIDENCE INTERVALS FOR THE BINOMIAL PROPORTION WITH 0 RESPONDERS
From: Dale McLerran on 9 Jan 2010 19:25
--- On Sat, 1/9/10, Ryan <ryan.andrew.black(a)GMAIL.COM> wrote: > From: Ryan <ryan.andrew.black(a)GMAIL.COM> > Subject: Re: Fisher's exact test appropriate here? > To: SAS-L(a)LISTSERV.UGA.EDU > Date: Saturday, January 9, 2010, 5:29 AM > On Jan 8, 7:51 pm, stringplaye...(a)YAHOO.COM > (Dale McLerran) wrote: > > > > Robert, > > > > Let me dispose of the statement about whether Fisher's exact > > test is using an approximation due to the large sample size. > > For a 2x2 table with row totals R1 and R2, column totals C1 > > and C2, and cell frequencies f11, f12, f21, and f22, the > > Fisher exact test depends on the computation > > > > P = [ ( R1! * R2! * C1! * C2!) / n! ] / > > ( f11! * f12! * f21! * f22!) > > > > for different arrangements of the cell frequencies fij. > > Now, SAS certainly cannot compute all of these factorials > > for the size sample which you have. Note, that > > > > X! = GAMMA(X+1) > > > > where GAMMA(u) is the gamma function and that > > > > log(X!) = lGAMMA(X+1) > > > > where lGAMMA is the log gamma function. Now, taking > > logarithms, we have > > > > log(P) = log( [ ( R1! * R2! * C1! * C2!) / n! ] / > > ( f11! * f12! * f21! * f22!) ) > > > > = log(R1!) + log(R2!) + log(C1!) + log(C2!) - log(n!) - > > log(f11!) - log(f12!) - log(f21!) - log(f22!) > > > > = lgamma(R1+1) + lgamma(R2+1) + lgamma(C1+1) + lgamma(C2+1) - > > lgamma(n+1) - > > lgamma(f11+1) - lgamma(f12+1) - lgamma(f21+1) -lgamma(f22+1) > > > > Now, what is really important for Fisher's exact test is > > not the value of P (or log(P)), but the value of P (log(P)) > > for the observed table compared to other possible tables > > which retain the same marginal frequencies. To the extent > > that the computation of log(P) using the lgamma function > > retains order, then the computation of Fisher's exact test > > is not at all affected by the sample size. I would really > > expect that log(P) would at least retain order across all > > possible tables which have the specified marginal values. > > Thus, the value of the Fisher exact test should not be > > compromised at all. > > > > Now, as to whether the Fisher exact p-value is better than > > p-values based on distributional assumptions (normal, Poisson), > > I would think that it wouldn't much matter for the sample size > > that you have here. Certainly, for the values which you > > present in your post, the Fisher exact test, chi-square test, > > and Poisson model all produce nonsignificant p-values. There > > is some discrepancy in p-values for the three methods. > > However, since all of the methods indicate that the p-value > > is greater than 0.50, any discrepancy is of trivial > > importance. > > > > You might have some other variables which you want to test > > (or you might not have revealed correct data). As you get > > closer to p=0.05, I would place money that the p-values > > will become more and more similar. If you are in the > > uncomfortable position of having one test where p<0.05 > > and another test where p>0.05, the interpretation is not > > really any different. Using p<0.05 is a rather arbitrary > > choice. > > > > Dale > > > > --------------------------------------- > > Dale McLerran > > Fred Hutchinson Cancer Research Center > > mailto: dmclerra(a)NO_SPAMfhcrc.org > > Ph: (206) 667-2926 > > Fax: (206) 667-5977 > > ---------------------------------------- Hide quoted > text - > > > > - Show quoted text - > > Dale, > > I computed P*log(P) for a 2X2 table with the following > cell > frequencies: > > f11=8 > f12=14 > f21=75 > f22=32 > > and obtained the value > > P*log(P)=-0.015585312549042 > > Here are the formulas I used to compute P*log(P) in another > stats > program: > > --------------- > > log_p = lngamma(22+1) + lngamma(107+1) + lngamma(83+1) + > lngamma(46+1) - lngamma(129+1) - > lngamma(8+1) - lngamma(14+1) - lngamma(75+1) - > lngamma(32+1) > > p_log_p = exp(log_p)*log_p > > -------------- > > If you have the time, could you please tell me what I did incorrectly > and exactly what p_log_p represents? > > Thanks, > > Ryan > Ryan, When I wrote "but the value of P (log(P))", you apparently interpreted that to mean that we would compute P*log(P). Previously in the same sentence, I had written "what is really important for Fisher's exact test is not the value of P (or log(P))". So, when I wrote "P (log(P))" in the same sentence, I meant for that to be interpreted as "P (or log(P))". But I was not really clear on that. Now, this P that we compute according to the formula specified above is NOT the Fisher's exact test p-value. Rather, it is a probability under multinomial sampling of the particular table with observed f11, f12, f21, and f22 among all tables which have the observed marginal frequencies R1, R2, C1, and C2. Given those same fixed marginals, there could be other values f11~, f12~, f21~, and f22~ which could have been observed. The Fisher's exact test p-value is obtained by computing P for the observed table as well as P~ (where P~ is the value of P computed for f11~, f12~, f21~, and f22~) for all possible tables having the observed marginal frequencies. We then compare P against the distribution of P~. Note, though, that P and log(P) have a monotonic relationship so that we could also compare log(P) against the distribution of log(P~). For that matter, P*log(P) and P have a monotonic relationship. So, the statistic which you have computed above could be employed to construct the Fisher's exact test p-value if you compare the observed table value of P*log(P) against the distribution of P~*log(P~). Dale --------------------------------------- Dale McLerran Fred Hutchinson Cancer Research Center mailto: dmclerra(a)NO_SPAMfhcrc.org Ph: (206) 667-2926 Fax: (206) 667-5977 --------------------------------------- |