From: Robin R High on
Robert,

Correlation depends on the discordant proportions of the _outcomes_ (not
the matching) in the off-diagonal cell percents, p10 and p01, of the
following table.

Mother 1 (control)
Yes=1 No=0 (mortality)
-------------
Mother 2 Yes=1 | p11 | p10 | .010=pt 1% of mothers in trtmnt group
experience mortality
(trtmnt) |-----------|
No=0 | p01 | p00 | .990 99% of mother in trtmnt group do not
experience mortality
-------------
.015 .985 1.00
=pc

1.5% of mothers in control group experience mortality
98.5% of mother in control group do not experience
mortality


The following formula connects the cell probabilities with the marginal
values, pt and pc:

corr = (p11*p00 - p10*p01)/SQRT((pc*(1-pc)*pt*(1-pt)));

Also note that these two differences are the same:

pt-pc = p10-p01

Through some manipulation of equations, and PROC MODEL to solve, one find
values for various scenarios, e.g., assuming 10000 pairs, what are the
resulting cell counts and probabilities for various correlations (.05,
..25, and .5) such that the marginal values are the same:

correlation=.05

Frequency|
Percent | 1| 2| Total
---------+--------+--------+
1 | 8 | 92 | 100
| 0.08 | 0.92 | 1.00
---------+--------+--------+
2 | 142 | 9758 | 9900
| 1.42 | 97.58 | 99.00
---------+--------+--------+
Total 150 9850 10000
1.50 98.50 100.00


correlation=.25

Frequency|
Percent | 1| 2| Total
---------+--------+--------+
1 | 32 | 68 | 100
| 0.32 | 0.68 | 1.00
---------+--------+--------+
2 | 118 | 9782 | 9900
| 1.18 | 97.82 | 99.00
---------+--------+--------+
Total 150 9850 10000
1.50 98.50 100.00



correlation=.5

Frequency|
Percent | 1| 2| Total
---------+--------+--------+
1 | 62 | 38 | 100
| 0.62 | 0.38 | 1.00
---------+--------+--------+
2 | 88 | 9812 | 9900
| 0.88 | 98.12 | 99.00
---------+--------+--------+
Total 150 9850 10000
1.50 98.50 100.00


To determine a correlation, make a 2x2 table of counts (like the one above
for corr=.50) that is your best guess of mortality and then run the counts
through proc freq:

DATA one;
input i j count;
cards;
1 1 62
1 2 38
2 1 88
2 2 9812
;

proc freq;
table i*j / measures;
weight count;
run;

produces:
Pearson Correlation = 0.5002
Spearman Correlation = 0.5002


I hesitate to say "details are left to the reader", but I recently worked
through this interesting problem in a similar project. It also helps to
assume larger proportions (e.g., 35/100 vs 30/100) than you are looking at
to make further sense of this. And there are some interesting connections
here between matched pairs (depending on the correlation) and independent
samples. Also, read chapter 3 of Paul Allison's SAS book on "Fixed
Effects" for other approaches to the McNemar test, esp when you have
explanatory variables.

Robin High
UNMC






From:
Robert Feyerharm <robertf(a)HEALTH.OK.GOV>
To:
SAS-L(a)LISTSERV.UGA.EDU
Date:
12/18/2009 03:57 PM
Subject:
proc power question for McNemar test
Sent by:
"SAS(r) Discussion" <SAS-L(a)LISTSERV.UGA.EDU>



I have a question regarding the power procedure for a paired case-control
design using the McNemar test.

I'm using proc power to estimate the necessary sample size for a proposed
public health study that will compare infant mortality rates between two
groups, a control group of mothers who received no public health
intervention & a treatment group who participated in the Children First or
Healthy Start programs. Mothers will be matched based on similar
demographic variables (race, age, education, etc.). We want to detect a
reduction in infant mortality from say 15 deaths per 1,000 live births
(p0=.015) to 10 deaths per 1,000 live births (p1=.010), with power=.80 and
alpha=.05.

Here's my code:

proc power;
pairedfreq dist=normal method=connor
test=mcnemar
corr=???
alpha=.05
relativerisk = .67
refproportion = 0.015
npairs = .
power = .8;
run;

My question: What is the correct value to use for the correlation
coefficient for exposure between cases and their matched controls? Since
every matched pair will be discordant (the case mother participates in the
health program & her control doesn't), is corr=0 appropriate?

Thanks in advance!

Robert Feyerharm
Oklahoma State Department of Health
From: Jeff on
On Dec 21, 2:18 pm, rh...(a)UNMC.EDU (Robin R High) wrote:
> Robert,
>
> Correlation depends on the discordant proportions of the _outcomes_ (not
> the matching) in the off-diagonal cell percents, p10 and p01, of the
> following table.
>
>                 Mother 1 (control)
>                Yes=1   No=0 (mortality)
>                -------------
> Mother 2 Yes=1 | p11 | p10 | .010=pt   1% of mothers in trtmnt group
> experience mortality
> (trtmnt)       |-----------|
>           No=0 | p01 | p00 | .990     99% of mother in trtmnt group do not
> experience mortality
>                -------------
>                 .015   .985   1.00
>                  =pc
>
>                 1.5% of mothers in control group experience mortality
>                98.5% of mother in control group do not experience
> mortality
>
> The following formula connects the cell probabilities with the marginal
> values, pt and pc:
>
> corr = (p11*p00 - p10*p01)/SQRT((pc*(1-pc)*pt*(1-pt)));
>
> Also note that these two differences are the same:
>
> pt-pc = p10-p01
>
> Through some manipulation of equations, and PROC MODEL to solve, one find
> values for various scenarios, e.g., assuming 10000 pairs, what are the
> resulting cell counts and probabilities for various correlations (.05,
> .25, and .5) such that the marginal values are the same:
>
> correlation=.05
>
> Frequency|
> Percent  |       1|       2|  Total
> ---------+--------+--------+
>        1 |      8 |     92 |    100
>          |   0.08 |   0.92 |   1.00
> ---------+--------+--------+
>        2 |    142 |   9758 |   9900
>          |   1.42 |  97.58 |  99.00
> ---------+--------+--------+
> Total         150     9850    10000
>              1.50    98.50   100.00
>
> correlation=.25
>
> Frequency|
> Percent  |       1|       2|  Total
> ---------+--------+--------+
>        1 |     32 |     68 |    100
>          |   0.32 |   0.68 |   1.00
> ---------+--------+--------+
>        2 |    118 |   9782 |   9900
>          |   1.18 |  97.82 |  99.00
> ---------+--------+--------+
> Total         150     9850    10000
>              1.50    98.50   100.00
>
> correlation=.5
>
> Frequency|
> Percent  |       1|       2|  Total
> ---------+--------+--------+
>        1 |     62 |     38 |    100
>          |   0.62 |   0.38 |   1.00
> ---------+--------+--------+
>        2 |     88 |   9812 |   9900
>          |   0.88 |  98.12 |  99.00
> ---------+--------+--------+
> Total         150     9850    10000
>              1.50    98.50   100.00
>
> To determine a correlation, make a 2x2 table of counts (like the one above
> for corr=.50) that is your best guess of mortality and then run the counts
> through proc freq:
>
> DATA one;
> input i j count;
> cards;
> 1 1 62
> 1 2 38
> 2 1 88
> 2 2 9812
> ;
>
> proc freq;
> table i*j / measures;
> weight count;
> run;
>
> produces:
> Pearson Correlation  = 0.5002
> Spearman Correlation = 0.5002
>
> I hesitate to say "details are left to the reader", but I recently worked
> through this interesting problem in a similar project.  It also helps to
> assume larger proportions (e.g., 35/100 vs 30/100) than you are looking at
> to make further sense of this. And there are some interesting connections
> here between matched pairs (depending on the correlation) and independent
> samples.  Also, read chapter 3 of Paul Allison's SAS book on "Fixed
> Effects" for other approaches to the McNemar test, esp when you have
> explanatory variables.
>
> Robin High
> UNMC
>
> From:
> Robert Feyerharm <robe...(a)HEALTH.OK.GOV>
> To:
> SA...(a)LISTSERV.UGA.EDU
> Date:
> 12/18/2009 03:57 PM
> Subject:
> proc power question for McNemar test
> Sent by:
> "SAS(r) Discussion" <SA...(a)LISTSERV.UGA.EDU>
>
> I have a question regarding the power procedure for a paired case-control
> design using the McNemar test.
>
> I'm using proc power to estimate the necessary sample size for a proposed
> public health study that will compare infant mortality rates between two
> groups, a control group of mothers who received no public health
> intervention & a treatment group who participated in the Children First or
> Healthy Start programs. Mothers will be matched based on similar
> demographic variables (race, age, education, etc.). We want to detect a
> reduction in infant mortality from say 15 deaths per 1,000 live births
> (p0=.015) to 10 deaths per 1,000 live births (p1=.010), with power=..80 and
> alpha=.05.
>
> Here's my code:
>
> proc power;
> pairedfreq dist=normal method=connor
> test=mcnemar
> corr=???
> alpha=.05
> relativerisk = .67
> refproportion = 0.015
> npairs = .
> power = .8;
> run;
>
> My question: What is the correct value to use for the correlation
> coefficient for exposure between cases and their matched controls? Since
> every matched pair will be discordant (the case mother participates in the
> health program & her control doesn't), is corr=0 appropriate?
>
> Thanks in advance!
>
> Robert Feyerharm
> Oklahoma State Department of Health

FYI:
Page 29 of Stokes et al.
Categorical Data Analysis Using The SAS System has this and a couple
other equivalent formulas.
From: "Feyerharm, Robert W." on
Thanks Robin, this is very helpful!

Robert

-----Original Message-----
From: Robin R High [mailto:rhigh(a)unmc.edu]
Sent: Monday, December 21, 2009 1:19 PM
To: Feyerharm, Robert W.
Cc: SAS-L(a)LISTSERV.UGA.EDU
Subject: Re: proc power question for McNemar test

Robert,

Correlation depends on the discordant proportions of the _outcomes_ (not

the matching) in the off-diagonal cell percents, p10 and p01, of the
following table.

Mother 1 (control)
Yes=1 No=0 (mortality)
-------------
Mother 2 Yes=1 | p11 | p10 | .010=pt 1% of mothers in trtmnt group
experience mortality
(trtmnt) |-----------|
No=0 | p01 | p00 | .990 99% of mother in trtmnt group do
not
experience mortality
-------------
.015 .985 1.00
=pc

1.5% of mothers in control group experience mortality
98.5% of mother in control group do not experience
mortality


The following formula connects the cell probabilities with the marginal
values, pt and pc:

corr = (p11*p00 - p10*p01)/SQRT((pc*(1-pc)*pt*(1-pt)));

Also note that these two differences are the same:

pt-pc = p10-p01

Through some manipulation of equations, and PROC MODEL to solve, one
find
values for various scenarios, e.g., assuming 10000 pairs, what are the
resulting cell counts and probabilities for various correlations (.05,
..25, and .5) such that the marginal values are the same:

correlation=.05

Frequency|
Percent | 1| 2| Total
---------+--------+--------+
1 | 8 | 92 | 100
| 0.08 | 0.92 | 1.00
---------+--------+--------+
2 | 142 | 9758 | 9900
| 1.42 | 97.58 | 99.00
---------+--------+--------+
Total 150 9850 10000
1.50 98.50 100.00


correlation=.25

Frequency|
Percent | 1| 2| Total
---------+--------+--------+
1 | 32 | 68 | 100
| 0.32 | 0.68 | 1.00
---------+--------+--------+
2 | 118 | 9782 | 9900
| 1.18 | 97.82 | 99.00
---------+--------+--------+
Total 150 9850 10000
1.50 98.50 100.00



correlation=.5

Frequency|
Percent | 1| 2| Total
---------+--------+--------+
1 | 62 | 38 | 100
| 0.62 | 0.38 | 1.00
---------+--------+--------+
2 | 88 | 9812 | 9900
| 0.88 | 98.12 | 99.00
---------+--------+--------+
Total 150 9850 10000
1.50 98.50 100.00


To determine a correlation, make a 2x2 table of counts (like the one
above
for corr=.50) that is your best guess of mortality and then run the
counts
through proc freq:

DATA one;
input i j count;
cards;
1 1 62
1 2 38
2 1 88
2 2 9812
;

proc freq;
table i*j / measures;
weight count;
run;

produces:
Pearson Correlation = 0.5002
Spearman Correlation = 0.5002


I hesitate to say "details are left to the reader", but I recently
worked
through this interesting problem in a similar project. It also helps to

assume larger proportions (e.g., 35/100 vs 30/100) than you are looking
at
to make further sense of this. And there are some interesting
connections
here between matched pairs (depending on the correlation) and
independent
samples. Also, read chapter 3 of Paul Allison's SAS book on "Fixed
Effects" for other approaches to the McNemar test, esp when you have
explanatory variables.

Robin High
UNMC






From:
Robert Feyerharm <robertf(a)HEALTH.OK.GOV>
To:
SAS-L(a)LISTSERV.UGA.EDU
Date:
12/18/2009 03:57 PM
Subject:
proc power question for McNemar test
Sent by:
"SAS(r) Discussion" <SAS-L(a)LISTSERV.UGA.EDU>



I have a question regarding the power procedure for a paired
case-control
design using the McNemar test.

I'm using proc power to estimate the necessary sample size for a
proposed
public health study that will compare infant mortality rates between two
groups, a control group of mothers who received no public health
intervention & a treatment group who participated in the Children First
or
Healthy Start programs. Mothers will be matched based on similar
demographic variables (race, age, education, etc.). We want to detect a
reduction in infant mortality from say 15 deaths per 1,000 live births
(p0=.015) to 10 deaths per 1,000 live births (p1=.010), with power=.80
and
alpha=.05.

Here's my code:

proc power;
pairedfreq dist=normal method=connor
test=mcnemar
corr=???
alpha=.05
relativerisk = .67
refproportion = 0.015
npairs = .
power = .8;
run;

My question: What is the correct value to use for the correlation
coefficient for exposure between cases and their matched controls? Since
every matched pair will be discordant (the case mother participates in
the
health program & her control doesn't), is corr=0 appropriate?

Thanks in advance!

Robert Feyerharm
Oklahoma State Department of Health