3 for 6? [SAS]

Prev: Proc logistic: how to add constraints on variables?
Next: PROC REPORT or PROC PRINT with variables > 256 characters

From: J Xin on 23 May 2010 06:41

I heard a discussion which I think is a little interesting.

A multinomial logistic reg has values of 1, 2, and 3 in its dependant
target. Initially upon business users' request that value 3 should be
default and they wanted to see 'incremental' of 1 and 2 on tope of 3,
the modeler, fully aware of pros and cons of each link function,
decided to build a cumulative, that is to have an ordinal logistic
reg. The model was validated OK: 3 target values vs. 3 predicted
values looked balanced and the % of correct classification rate was
about 65%. The usual model concordance..Everything looked OK.

Later new management came in; the initial business users that
requested the project were gone.

The new management, known to be technically savvy, found that while
the model does good job assigning records into 1, 2 and 3 based on the
highest probability score among the three, in other words, a good job
deciding the 'first digit', the model only has 4 combinations

123
213
312
321

In other words, the combination of 132 and 231 are NOT present (in the
background). The modeler, when building the model, tested the glogit
link function ('pure multinomial model') and found the glogit model
was not balanced well. One assigned class was much larger than its
counterpart in the initial target; the glogit model was not stable,
although the overall correct classification was slightly better, that
is, the first byte.

The modeler never knew the disappearance of 132 and 231 combinations
in the ordinal model. Upon the new management's input, he found the
unstable glogit produces the full 6 combinations.

The expectation for the full 6 values is a very legit business
question. But the modeler's technical responsibility is not to secure
all 6 subclasses, as long as he does good job classifying 1, 2 and 3,
the first digit, because to secure (ideally and optimally) all 6
values in the prediction is to require all the 6 classes to be
available in the dependant target variable in the first place. Data
only support definition of 3 values, not 6. We all know if the modeler
was given 6 values, she produced a model predicting only 4 or 3
values, she ought to be 'spanked' or even fired. That certainly is not
the case here.

Well, the new management cares more about 'now and future'. They
acknowledged the initial request, but insisted upon having 6 values in
the prediction. In other words, if the ordinal logistic reg produces 6
values, they would have signed off the project. They don't care about
the difference between the link functions and all the other technical
details, although the modeler knows whichever link function is used,
producing 6 values is less than optimal (even if it may look stable).
The appearance of 6 values satisfies the business request cosmetically
but not on substance or technical merit.

My opinion is ordinal logistic reg tends to be more rigid, but more
stable than GLOGIT. Glogit is more flexible, but I can not make
general observation about its stability. I sided with the modeler that
she should not be asked to stretch 3 to 6 and sub-optimality in doing
that. I know very well she would have built a 6 value target variable
should she have had the data to do it. But we are building in
academia. We have business client to serve. The customer is
'emperor'.

What do you think? Thanks.

|
Pages: 1
Prev: Proc logistic: how to add constraints on variables?
Next: PROC REPORT or PROC PRINT with variables > 256 characters