Latent Class Analysis via NLMIXED

Prev: Test exercises for someone learning SAS
Next: "ODBC engine cannot be found"

From: Ryan on 4 Dec 2009 02:57

Hi Dale,

When I tried to run the model late last night, I realized immediately
that the REs component was misspecified. Thanks for the correct
(assuming
independent correlations) specification. I have a follow-up question,
which I promise
won't be more than a few lines of code. :)

When I think about the LCA, I think about wanting at least two
outputs:

(1) probability that a (+) response on an item is associated with each
latent class
(2) probability that each observation is associated with each latent
class

I believe you solved problem (2) with the Estimate statements in the
previous thread, which
was based on an example of 3 manifest dichotomous variables and 2
latent classes. I'd like to
try to solve for at least one combination of responses to all items
for my model (assuming 19
dichotomousmanifest variables and 6 latent classes) to see if I'm
doing this correctly.

Here goes nothing...

/
*--------------------------------------------------------------------------------------
*/
/*Calculate the probabililty that the observation belongs to */
/* latent class 1, given a
(-) */
/* response on all items except the final 19th item */
/
*--------------------------------------------------------------------------------------
*/
estimate "P(LC=1|x1=0,x2=0,x3=0,...,x19=1)"
(eta1*(1-pi11)*(1-pi21)*(1-pi31)*...*(pi191)) /
(eta1*(1-pi11)*(1-pi21)*(1-pi31)*...*(pi191) +
eta2*(1-pi12)*(1-pi22)*(1-pi32)*...*(pi192) +
eta3*(1-pi13)*(1-pi23)*(1-pi33)*...*(pi193) +
eta4*(1-pi14)*(1-pi24)*(1-pi34)*...*(pi194) +
eta5*(1-pi15)*(1-pi25)*(1-pi35)*...*(pi195) +
eta6*(1-pi16)*(1-pi26)*(1-pi36)*...*(pi196));

/---------------------------------------------------------------------------------------
*/

Does that seem correct to you?

If I'm even remotely correct, this could take me a long, long time to
write all possible
combinations! I assume a macro might work here. I'll have to look into
using one--new
territory for me.

Best,

Ryan

p.s. I plan on taking your advice and using underscores in the actual
code.

From: Dale McLerran on 4 Dec 2009 02:01

Ryan,

I looked further, too, at your RANDOM statement. That is
also not properly specified. You had:

random u1 u2 u3 u4 u5 ~ normal([0], [exp(2*log_Valpha1)],
[0], [exp(2*log_Valpha2)],
[0], [exp(2*log_Valpha3)],
[0], [exp(2*log_Valpha4)],
[0], [exp(2*log_Valpha5)] )
subject=person;

What you were apparently trying to specify is that each
random effect has mean zero and specified variance. Further,
it appears that you want to assume that the random effects
are uncorrelated. It is probably reasonable to assume that
the latent class random effects are uncorrelated (although
this is something that you might want to check through a
likelihood ratio test).

You'll note that I stated what you were "apparently" trying
to specify. The construction of your random statement is
correct only through "normal(". After that, you must
specify all of the expected values for u1 through u5
followed by a covariance matrix of the random effects.
Actually, we only need to specify the upper triangular
part of the covariance matrix. So, assuming that the
random effects are independent, proper specification of
your random statement would be

random u1 u2 u3 u4 u5 ~
normal([0,0,0,0,0], [exp(2*log_Valpha1)], 0, 0, 0, 0,
[exp(2*log_Valpha2)], 0, 0, 0,
[exp(2*log_Valpha3)], 0, 0,
[exp(2*log_Valpha4)], 0,
[exp(2*log_Valpha5)] )
subject=person;

Dale

---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: dmclerra(a)NO_SPAMfhcrc.org
Ph: (206) 667-2926
Fax: (206) 667-5977
---------------------------------------

From: Dale McLerran on 4 Dec 2009 13:37

--- On Thu, 12/3/09, Ryan <ryan.andrew.black(a)GMAIL.COM> wrote:

> From: Ryan <ryan.andrew.black(a)GMAIL.COM>
> Subject: Re: Latent Class Analysis via NLMIXED - UPDATE
> To: SAS-L(a)LISTSERV.UGA.EDU
> Date: Thursday, December 3, 2009, 11:57 PM
> Hi Dale,
>
> When I tried to run the model late last night, I realized immediately
> that the REs component was misspecified. Thanks for the correct
> (assuming independent correlations) specification. I have a follow-up
> question, which I promise won't be more than a few lines of code. :)
>
> When I think about the LCA, I think about wanting at least two
> outputs:
>
> (1) probability that a (+) response on an item is associated with
> each latent class
> (2) probability that each observation is associated with each
> latent class
>
> I believe you solved problem (2) with the Estimate statements in
> the previous thread, which was based on an example of 3 manifest
> dichotomous variables and 2 latent classes. I'd like to try to
> solve for at least one combination of responses to all items
> for my model (assuming 19 dichotomous manifest variables and 6
> latent classes) to see if I'm doing this correctly.
>
> Here goes nothing...
>
> /*---------------------------------------------------------------*/
> /* Calculate the probabililty that the observation belongs to */
> /* latent class 1, given a (-) response on all items except the */
> /* final 19th item */
> /*---------------------------------------------------------------*/
> estimate "P(LC=1|x1=0,x2=0,x3=0,...,x19=1)"
> (eta1*(1-pi11)*(1-pi21)*(1-pi31)*...*(pi191)) /
> (eta1*(1-pi11)*(1-pi21)*(1-pi31)*...*(pi191) +
> eta2*(1-pi12)*(1-pi22)*(1-pi32)*...*(pi192) +
> eta3*(1-pi13)*(1-pi23)*(1-pi33)*...*(pi193) +
> eta4*(1-pi14)*(1-pi24)*(1-pi34)*...*(pi194) +
> eta5*(1-pi15)*(1-pi25)*(1-pi35)*...*(pi195) +
> eta6*(1-pi16)*(1-pi26)*(1-pi36)*...*(pi196));
>
>
> Does that seem correct to you?
>

This seems to be a solution to problem #2 at the top of your
post: Given a particular manifest variable combination, what
is the probability of the observation belonging in latent
class 1. For problem #2, your code is correct. However, that
is, as you noted at the top, something which I had already
addressed in a post yesterday.

My understanding of what you want as a solution to problem
#1 is as follows: Suppose we only observed X19=1. What is
the probability that an observation with X19=1 belongs to
latent class 1? For the solution to that problem, see further
down in this post.

But first, just a brief comment on the label part of your
ESTIMATE statement. With an X vector of length 19, I would
write the label part of the ESTIMATE statement (the quoted
part) as:

estimate "P(LC=1|x=0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1)"

You can clearly see which manifest variables are turned on
or off. You do have to count across to determine which
variable is being referenced. But I think it is preferable
to naming the manifest variables and losing track of which
which variables are turned on or off in such a long string
of variable names.

> If I'm even remotely correct, this could take me a long, long time to
> write all possible combinations! I assume a macro might work here.
> I'll have to look into using one--new territory for me.
>
> Best,
>
> Ryan
>
> p.s. I plan on taking your advice and using underscores in
> the actual code.
>

Here is my take on what you really want as a solution to problem
1. Let's take a step back to the two latent class model from
three manifest variables. The probabilities of each latent
class and manifest variable combination are computed as follows:

Latent Class
Xvec 1 2
000 eta1*(1-pi11)*(1-pi21)*(1-pi31) eta2*(1-pi12)*(1-pi22)*(1-pi32)
001 eta1*(1-pi11)*(1-pi21)*(pi31) eta2*(1-pi12)*(1-pi22)*(pi32)
010 eta1*(1-pi11)*(pi21)*(1-pi31) eta2*(1-pi12)*(pi22)*(1-pi32)
011 eta1*(1-pi11)*(pi21)*(pi31) eta2*(1-pi12)*(pi22)*(pi32)
___________________________________________________________________
100 | eta1*(pi11)*(1-pi21)*(1-pi31) eta2*(pi12)*(1-pi22)*(1-pi32) |
101 | eta1*(pi11)*(1-pi21)*(pi31) eta2*(pi12)*(1-pi22)*(pi32) |
110 | eta1*(pi11)*(pi21)*(1-pi31) eta2*(pi12)*(pi22)*(1-pi32) |
111 | eta1*(pi11)*(pi21)*(pi31) eta2*(pi12)*(pi22)*(pi32) |
-------------------------------------------------------------------

So, if we are interested in the probability that LC=1 when
X1=1, we are interested in the ratio of the sum of all LC=1
probabilities in the boxed area above to the sum of all
probabilities in the boxed area. Thus, we would have

P(LC=1 | x1=1) =
(eta1*(pi11)*(1-pi21)*(1-pi31) + eta1*(pi11)*(1-pi21)*(pi31) +
eta1*(pi11)*(pi21)*(1-pi31) + eta1*(pi11)*(pi21)*(pi31) )

/

(eta1*(pi11)*(1-pi21)*(1-pi31) + eta1*(pi11)*(1-pi21)*(pi31) +
eta1*(pi11)*(pi21)*(1-pi31) + eta1*(pi11)*(pi21)*(pi31) +

eta2*(pi12)*(pi22)*(1-pi32) + eta2*(pi12)*(pi22)*(1-pi32) +
eta2*(pi12)*(1-pi22)*(pi32) + eta2*(pi12)*(pi22)*(pi32) )

Guess what? Your problem only got bigger! Since there are 2^m
combinations of manifest variables for each latent class and
you need half of those (2^(m-1)) in the numerator and C*(2^(m-1))
in the denominator, you really need that macro code now to loop
over all of the different probabilities. (You are probably
ready now to shoot the messenger! But really, I am just trying
to help!)

I would note that there is yet another item which you probably
would like to have. It would be desirable to know the estimated
probability for a particular manifest variable combination given
the number of latent classes in your model. This would allow
you to assess whether your latent class model is providing a
satisfactory fit to the observed data. (For more on this, see
John Uebersax's web page on latent class analysis.)

For our small problem with only two latent classes and three
manifest variables, the probability of each manifest variable
combination is obtained by summing across the rows of the
table shown above.

Dale

---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: dmclerra(a)NO_SPAMfhcrc.org
Ph: (206) 667-2926
Fax: (206) 667-5977
---------------------------------------

From: Ryan on 4 Dec 2009 20:08

On Dec 4, 1:37 pm, stringplaye...(a)YAHOO.COM (Dale McLerran) wrote:
> --- On Thu, 12/3/09, Ryan <ryan.andrew.bl...(a)GMAIL.COM> wrote:
>
>
>
>
>
> > From: Ryan <ryan.andrew.bl...(a)GMAIL.COM>
> > Subject: Re: Latent Class Analysis via NLMIXED - UPDATE
> > To: SA...(a)LISTSERV.UGA.EDU
> > Date: Thursday, December 3, 2009, 11:57 PM
> > HiDale,
>
> > When I tried to run the model late last night, I realized immediately
> > that the REs component was misspecified. Thanks for the correct
> > (assuming independent correlations) specification. I have a follow-up
> > question, which I promise won't be more than a few lines of code. :)
>
> > When I think about the LCA, I think about wanting at least two
> > outputs:
>
> > (1) probability that a (+) response on an item is associated with
> > each latent class
> > (2) probability that each observation is associated with each
> > latent class
>
> > I believe you solved problem (2) with the Estimate statements in
> > the previous thread, which was based on an example of 3 manifest
> > dichotomous variables and 2 latent classes. I'd like to try to
> > solve for at least one combination of responses to all items
> > for my model (assuming 19 dichotomous manifest variables and 6
> > latent classes) to see if I'm doing this correctly.
>
> > Here goes nothing...
>
> > /*---------------------------------------------------------------*/
> > /* Calculate the probabililty that the observation belongs to */
> > /* latent class 1, given a (-) response on all items except the */
> > /* final 19th item */
> > /*---------------------------------------------------------------*/
> > estimate "P(LC=1|x1=0,x2=0,x3=0,...,x19=1)"
> > (eta1*(1-pi11)*(1-pi21)*(1-pi31)*...*(pi191)) /
> > (eta1*(1-pi11)*(1-pi21)*(1-pi31)*...*(pi191) +
> > eta2*(1-pi12)*(1-pi22)*(1-pi32)*...*(pi192) +
> > eta3*(1-pi13)*(1-pi23)*(1-pi33)*...*(pi193) +
> > eta4*(1-pi14)*(1-pi24)*(1-pi34)*...*(pi194) +
> > eta5*(1-pi15)*(1-pi25)*(1-pi35)*...*(pi195) +
> > eta6*(1-pi16)*(1-pi26)*(1-pi36)*...*(pi196));
>
> > Does that seem correct to you?
>
> This seems to be a solution to problem #2 at the top of your
> post: Given a particular manifest variable combination, what
> is the probability of the observation belonging in latent
> class 1. For problem #2, your code is correct. However, that
> is, as you noted at the top, something which I had already
> addressed in a post yesterday.
>
> My understanding of what you want as a solution to problem
> #1 is as follows: Suppose we only observed X19=1. What is
> the probability that an observation with X19=1 belongs to
> latent class 1? For the solution to that problem, see further
> down in this post.
>
> But first, just a brief comment on the label part of your
> ESTIMATE statement. With an X vector of length 19, I would
> write the label part of the ESTIMATE statement (the quoted
> part) as:
>
> estimate "P(LC=1|x=0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1)"
>
> You can clearly see which manifest variables are turned on
> or off. You do have to count across to determine which
> variable is being referenced. But I think it is preferable
> to naming the manifest variables and losing track of which
> which variables are turned on or off in such a long string
> of variable names.
>
> > If I'm even remotely correct, this could take me a long, long time to
> > write all possible combinations! I assume a macro might work here.
> > I'll have to look into using one--new territory for me.
>
> > Best,
>
> > Ryan
>
> > p.s. I plan on taking your advice and using underscores in
> > the actual code.
>
> Here is my take on what you really want as a solution to problem
> 1. Let's take a step back to the two latent class model from
> three manifest variables. The probabilities of each latent
> class and manifest variable combination are computed as follows:
>
> Latent Class
> Xvec 1 2
> 000 eta1*(1-pi11)*(1-pi21)*(1-pi31) eta2*(1-pi12)*(1-pi22)*(1-pi32)
> 001 eta1*(1-pi11)*(1-pi21)*(pi31) eta2*(1-pi12)*(1-pi22)*(pi32)
> 010 eta1*(1-pi11)*(pi21)*(1-pi31) eta2*(1-pi12)*(pi22)*(1-pi32)
> 011 eta1*(1-pi11)*(pi21)*(pi31) eta2*(1-pi12)*(pi22)*(pi32)
> ___________________________________________________________________
> 100 | eta1*(pi11)*(1-pi21)*(1-pi31) eta2*(pi12)*(1-pi22)*(1-pi32) |
> 101 | eta1*(pi11)*(1-pi21)*(pi31) eta2*(pi12)*(1-pi22)*(pi32) |
> 110 | eta1*(pi11)*(pi21)*(1-pi31) eta2*(pi12)*(pi22)*(1-pi32) |
> 111 | eta1*(pi11)*(pi21)*(pi31) eta2*(pi12)*(pi22)*(pi32) |
> -------------------------------------------------------------------
>
> So, if we are interested in the probability that LC=1 when
> X1=1, we are interested in the ratio of the sum of all LC=1
> probabilities in the boxed area above to the sum of all
> probabilities in the boxed area. Thus, we would have
>
> P(LC=1 | x1=1) =
> (eta1*(pi11)*(1-pi21)*(1-pi31) + eta1*(pi11)*(1-pi21)*(pi31) +
> eta1*(pi11)*(pi21)*(1-pi31) + eta1*(pi11)*(pi21)*(pi31) )
>
> /
>
> (eta1*(pi11)*(1-pi21)*(1-pi31) + eta1*(pi11)*(1-pi21)*(pi31) +
> eta1*(pi11)*(pi21)*(1-pi31) + eta1*(pi11)*(pi21)*(pi31) +
>
> eta2*(pi12)*(pi22)*(1-pi32) + eta2*(pi12)*(pi22)*(1-pi32) +
> eta2*(pi12)*(1-pi22)*(pi32) + eta2*(pi12)*(pi22)*(pi32) )
>
> Guess what? Your problem only got bigger! Since there are 2^m
> combinations of manifest variables for each latent class and
> you need half of those (2^(m-1)) in the numerator and C*(2^(m-1))
> in the denominator, you really need that macro code now to loop
> over all of the different probabilities. (You are probably
> ready now to shoot the messenger! But really, I am just trying
> to help!)
>
> I would note that there is yet another item which you probably
> would like to have. It would be desirable to know the estimated
> probability for a particular manifest variable combination given
> the number of latent classes in your model. This would allow
> you to assess whether your latent class model is providing a
> satisfactory fit to the observed data. (For more on this, see
> John Uebersax's web page on latent class analysis.)
>
> For our small problem with only two latent classes and three
> manifest variables, the probability of each manifest variable
> combination is obtained by summing across the rows of the
> table shown above.
>
> Dale
>
> ---------------------------------------DaleMcLerran
> Fred Hutchinson Cancer Research Center
> mailto: dmclerra(a)NO_SPAMfhcrc.org
> Ph: (206) 667-2926
> Fax: (206) 667-5977
> ---------------------------------------- Hide quoted text -
>
> - Show quoted text -

Dale,

By the time I write out all of the ESTIMATE statements in their full
glory, say 20 years or so from present day, the statistical progammers
at SAS will have enhanced the GLIMMIX procedure to handle a random
effects LCA model with just a click of a button!

Seriously, it was very kind of you to answer all of my questions.
You've given me much to consider, and as always, I've learned a
tremendous amount. If I decide to go with the nlmixed procedure for
this model, which is possible now that you've provided me with all
this info, I will write back with an update.

Take care,

Ryan

First | Prev |
Pages: 1 2
Prev: Test exercises for someone learning SAS
Next: "ODBC engine cannot be found"