From: Steven Raimi on
I have developed 600+ potential predictors for use in a logistic
regression model I'm working on. I want to screen each as efficiently as
possible for predictive power (using the c-statistic). We have a brute-
force method to generate the c-statistics (proc logistic on
yvar=xvar_in_question, then numerically integrate the ROC curve to
estimate), but there has to be a more straightforward (and efficient) way
to perform this task, right?

Also, I want to identify variables/groups of variables that are collinear,
so I can leave out all but the most sensible one(s) (per subject matter
knowledge). I could use PROC CORR, but that will be overwhelmed trying to
do 600*600 combinations. Again, isn't there a better way to attack this?

FYI - I have both SAS and JMP available. Only about 5% of the dataset can
fit in JMP - but we'll be developing the regression there (using all
target outcomes, and a few percent of the other records so there's a
minimum of two non-target records per target one).

Thanks for the guidance!
Steve