Prev: Proc logistic: how to add constraints on variables?
Next: PROC REPORT or PROC PRINT with variables > 256 characters
From: J Xin on 23 May 2010 06:41 I heard a discussion which I think is a little interesting. A multinomial logistic reg has values of 1, 2, and 3 in its dependant target. Initially upon business users' request that value 3 should be default and they wanted to see 'incremental' of 1 and 2 on tope of 3, the modeler, fully aware of pros and cons of each link function, decided to build a cumulative, that is to have an ordinal logistic reg. The model was validated OK: 3 target values vs. 3 predicted values looked balanced and the % of correct classification rate was about 65%. The usual model concordance..Everything looked OK. Later new management came in; the initial business users that requested the project were gone. The new management, known to be technically savvy, found that while the model does good job assigning records into 1, 2 and 3 based on the highest probability score among the three, in other words, a good job deciding the 'first digit', the model only has 4 combinations 123 213 312 321 In other words, the combination of 132 and 231 are NOT present (in the background). The modeler, when building the model, tested the glogit link function ('pure multinomial model') and found the glogit model was not balanced well. One assigned class was much larger than its counterpart in the initial target; the glogit model was not stable, although the overall correct classification was slightly better, that is, the first byte. The modeler never knew the disappearance of 132 and 231 combinations in the ordinal model. Upon the new management's input, he found the unstable glogit produces the full 6 combinations. The expectation for the full 6 values is a very legit business question. But the modeler's technical responsibility is not to secure all 6 subclasses, as long as he does good job classifying 1, 2 and 3, the first digit, because to secure (ideally and optimally) all 6 values in the prediction is to require all the 6 classes to be available in the dependant target variable in the first place. Data only support definition of 3 values, not 6. We all know if the modeler was given 6 values, she produced a model predicting only 4 or 3 values, she ought to be 'spanked' or even fired. That certainly is not the case here. Well, the new management cares more about 'now and future'. They acknowledged the initial request, but insisted upon having 6 values in the prediction. In other words, if the ordinal logistic reg produces 6 values, they would have signed off the project. They don't care about the difference between the link functions and all the other technical details, although the modeler knows whichever link function is used, producing 6 values is less than optimal (even if it may look stable). The appearance of 6 values satisfies the business request cosmetically but not on substance or technical merit. My opinion is ordinal logistic reg tends to be more rigid, but more stable than GLOGIT. Glogit is more flexible, but I can not make general observation about its stability. I sided with the modeler that she should not be asked to stretch 3 to 6 and sub-optimality in doing that. I know very well she would have built a 6 value target variable should she have had the data to do it. But we are building in academia. We have business client to serve. The customer is 'emperor'. What do you think? Thanks. |