df for confidence interval with a random effect (maybe Satterthwaite) [SAS]

Prev: Inputting Question
Next: Using character variales as continuous variables

From: John Uebersax on 10 Mar 2010 17:45

Hello Group,

Suppose I want to construct a confidence interval for a variable's
value (not mean) based on 8 samples, each with 200 observations.
There is a random effect across samples, such that:

observed value = true score + sample effect (r) + error

or simply

y = x + r + e

So r is like a per-sample bias, normally distributed across samples,
and independent of x. Simply treating the data as one large sample
(N=1600) and computing a classic confidence interval:

LL = grand mean + t_inv(alpha, 1599) * std dev
UL = grand mean - t_inv(alpha, 1599) * std dev

where std dev is computed from total variance and t_inv is the inverse
t function, won't work, because this doesn't fully recognize
uncertainty associated with r, which has only 8 - 1 = 7 df. So the
true df for t_inv is between 7 and 1599. But what is the value?

I seem to recall that in this case one may apply a simple formula
(Sattherwaite?) to get the correct df.

Would that work here, and if so could anyone suggest an informative
source that gives the correct formula and maybe an example?

(p.s. I suppose this can be approached in a more complex way, e.g.,
hierarchical modeling, but want to pursue the simpler method before
considering any alternatives.)

Thanks in advance.

John Uebersax PhD

From: Ray Koopman on 11 Mar 2010 01:48

On Mar 10, 2:45 pm, John Uebersax <jsueber...(a)gmail.com> wrote:
> Hello Group,
>
> Suppose I want to construct a confidence interval for a variable's
> value (not mean) based on 8 samples, each with 200 observations.
> There is a random effect across samples, such that:
>
> observed value = true score + sample effect (r) + error
>
> or simply
>
> y = x + r + e
>
> So r is like a per-sample bias, normally distributed across samples,
> and independent of x. Simply treating the data as one large sample
> (N=1600) and computing a classic confidence interval:
>
> LL = grand mean + t_inv(alpha, 1599) * std dev
> UL = grand mean - t_inv(alpha, 1599) * std dev
>
> where std dev is computed from total variance and t_inv is the
> inverse t function, won't work, because this doesn't fully recognize
> uncertainty associated with r, which has only 8 - 1 = 7 df. So the
> true df for t_inv is between 7 and 1599. But what is the value?
>
> I seem to recall that in this case one may apply a simple formula
> (Sattherwaite?) to get the correct df.
>
> Would that work here, and if so could anyone suggest an informative
> source that gives the correct formula and maybe an example?
>
> (p.s. I suppose this can be approached in a more complex way, e.g.,
> hierarchical modeling, but want to pursue the simpler method before
> considering any alternatives.)
>
> Thanks in advance.
>
> John Uebersax PhD

If all the usual random-model assumptions hold then the estimate of
the overall variance is

v = MSB/n + MSW*(1-1/n)

and the Welch-Satterthwaite df approximation is

v^2
f' = ---------------------------,
(MSB/n)^2 (MSW*(1-1/n))^2
--------- + ---------------
k-1 k(n-1)

where MSB and MSW are the between-group and within-group mean squares
from a one-way anova, k is the number of groups (in your case, 8), and
n is the number of observations in each sample (in your case, 200).

This is discussed in many texts in the sections on random models
and variance component estimation, such as sections 12.5-12.7 in
Mendenhall's (1968) Introduction to Linear Models and The Design
and Analysis of Experiments.

From: John Uebersax on 22 Mar 2010 19:41

Thanks Ray,

Okay, so if one uses the variance formula (= v) and the Welch-
Satterthwaite degrees-of-freedom formula (= W-Sdf) you gave, then to
estimate the 95% confidence of future observations, each being a
random observation from a random batch, would that be:

LL = mean - t_critical * sqrt(v)
UL = mean + t_critical * sqrt(v)

where t_critical = t_inv(.05/2, W-Sdf)

and t_inv is the inverse t statistic cdf with W-Sdf degrees of
freedom?

(and, if so, could anyone suggest an example of estimating a CI this
way in the 'quality' literature?)

John

On Mar 10, 11:48 pm, Ray Koopman <koop...(a)sfu.ca> wrote:
> On Mar 10, 2:45 pm, JohnUebersax<jsueber...(a)gmail.com> wrote:
>
>
>
> > Hello Group,
>
> > Suppose I want to construct a confidence interval for a variable's
> > value (not mean) based on 8 samples, each with 200 observations.
> > There is a random effect across samples, such that:
>
> > observed value = true score + sample effect (r) + error
>
> > or simply
>
> > y = x + r + e
>
> > So r is like a per-sample bias, normally distributed across samples,
> > and independent of x. Simply treating the data as one large sample
> > (N=1600) and computing a classic confidence interval:
>
> > LL = grand mean + t_inv(alpha, 1599) * std dev
> > UL = grand mean - t_inv(alpha, 1599) * std dev
>
> > where std dev is computed from total variance and t_inv is the
> > inverse t function, won't work, because this doesn't fully recognize
> > uncertainty associated with r, which has only 8 - 1 = 7 df. So the
> > true df for t_inv is between 7 and 1599. But what is the value?
>
> > I seem to recall that in this case one may apply a simple formula
> > (Sattherwaite?) to get the correct df.
>
> > Would that work here, and if so could anyone suggest an informative
> > source that gives the correct formula and maybe an example?
>
> > (p.s. I suppose this can be approached in a more complex way, e.g.,
> > hierarchical modeling, but want to pursue the simpler method before
> > considering any alternatives.)
>
> > Thanks in advance.
>
> > JohnUebersaxPhD
>
> If all the usual random-model assumptions hold then the estimate of
> the overall variance is
>
> v = MSB/n + MSW*(1-1/n)
>
> and the Welch-Satterthwaite df approximation is
>
> v^2
> f' = ---------------------------,
> (MSB/n)^2 (MSW*(1-1/n))^2
> --------- + ---------------
> k-1 k(n-1)
>
> where MSB and MSW are the between-group and within-group mean squares
> from a one-way anova, k is the number of groups (in your case, 8), and
> n is the number of observations in each sample (in your case, 200).
>
> This is discussed in many texts in the sections on random models
> and variance component estimation, such as sections 12.5-12.7 in
> Mendenhall's (1968) Introduction to Linear Models and The Design
> and Analysis of Experiments.

From: Ray Koopman on 23 Mar 2010 01:20

On Mar 22, 4:41 pm, John Uebersax <jsueber...(a)gmail.com> wrote:
> Thanks Ray,
>
> Okay, so if one uses the variance formula (= v) and the Welch-
> Satterthwaite degrees-of-freedom formula (= W-Sdf) you gave, then
> to estimate the 95% confidence of future observations, each being
> a random observation from a random batch, would that be:
>
> LL = mean - t_critical * sqrt(v)
> UL = mean + t_critical * sqrt(v)

Use sqrt( v*(1 + 1/(k*n)) ), to adjust for
the fact that you're using a sample mean.

>
> where t_critical = t_inv(.05/2, W-Sdf)
>
> and t_inv is the inverse t statistic cdf with W-Sdf degrees of
> freedom?
>
> (and, if so, could anyone suggest an example of estimating a CI
> this way in the 'quality' literature?)

Sorry, I don't know that literature.

>
> John
>
> On Mar 10, 11:48 pm, Ray Koopman <koop...(a)sfu.ca> wrote:
>> On Mar 10, 2:45 pm, JohnUebersax<jsueber...(a)gmail.com> wrote:
>>
>>> Hello Group,
>>>
>>> Suppose I want to construct a confidence interval for a variable's
>>> value (not mean) based on 8 samples, each with 200 observations.
>>> There is a random effect across samples, such that:
>>>
>>> observed value = true score + sample effect (r) + error
>>>
>>> or simply
>>>
>>> y = x + r + e
>>>
>>> So r is like a per-sample bias, normally distributed across samples,
>>> and independent of x. Simply treating the data as one large sample
>>> (N=1600) and computing a classic confidence interval:
>>>
>>> LL = grand mean + t_inv(alpha, 1599) * std dev
>>> UL = grand mean - t_inv(alpha, 1599) * std dev
>>>
>>> where std dev is computed from total variance and t_inv is the
>>> inverse t function, won't work, because this doesn't fully recognize
>>> uncertainty associated with r, which has only 8 - 1 = 7 df. So the
>>> true df for t_inv is between 7 and 1599. But what is the value?
>>>
>>> I seem to recall that in this case one may apply a simple formula
>>> (Sattherwaite?) to get the correct df.
>>>
>>> Would that work here, and if so could anyone suggest an informative
>>> source that gives the correct formula and maybe an example?
>>>
>>> (p.s. I suppose this can be approached in a more complex way, e.g.,
>>> hierarchical modeling, but want to pursue the simpler method before
>>> considering any alternatives.)
>>>
>>> Thanks in advance.
>>>
>>> John Uebersax PhD
>>
>> If all the usual random-model assumptions hold then the estimate of
>> the overall variance is
>>
>> v = MSB/n + MSW*(1-1/n)
>>
>> and the Welch-Satterthwaite df approximation is
>>
>> v^2
>> f' = ---------------------------,
>> (MSB/n)^2 (MSW*(1-1/n))^2
>> --------- + ---------------
>> k-1 k(n-1)
>>
>> where MSB and MSW are the between-group and within-group mean squares
>> from a one-way anova, k is the number of groups (in your case, 8), and
>> n is the number of observations in each sample (in your case, 200).
>>
>> This is discussed in many texts in the sections on random models
>> and variance component estimation, such as sections 12.5-12.7 in
>> Mendenhall's (1968) Introduction to Linear Models and The Design
>> and Analysis of Experiments.

|
Pages: 1
Prev: Inputting Question
Next: Using character variales as continuous variables