glmfit [Matlab]

Prev: FFT of Impulse response
Next: regular expression

From: Anneley on 4 Aug 2010 07:21

I am using a generalised linear model for a binomial distribution. I have many potential independent variables to describe the dependent variable (up to 20), however I assume most of these are not going to be useful to the model. I would like to understand how best to go about choosing the best inputs. My first test was to run glmfit with all the parameters, and to take out those parameters with coefficients nearest 0. Is this a good way to start? Also what are the best ways to compare models? I am getting sfit values of around 0.44 and a deviance of fit of around 500.

From: Greg Heath on 5 Aug 2010 11:37

On Aug 4, 7:21 am, "Anneley " <anneley.mcmillanremove.t...(a)yahoo.com>
wrote:
> I am using a generalised linear model for a binomial distribution. I have many potential independent variables to describe the dependent variable (up to 20), however I assume most of these are not going to be useful to the model. I would like to understand how best to go about choosing the best inputs. My first test was to run glmfit with all the parameters, and to take out those parameters with coefficients nearest 0. Is this a good way to start? Also what are the best ways to compare models? I am getting sfit values of around 0.44 and a deviance of fit of around 500.

I'm not a statistician; so take these comments
with a grain of salt:

In terms of selecting a good subset of input
variables for a logistic model I have had good
luck considering the backward search combinations
obtained from the linear model obtained via
STEPWISEFIT.

One nice feature is that you can constrain
the model to keep your favorite inputs.

For context I also

1. Check out the forward search results.
2. Include squares and crossproducts provided
the original number of inputs is not too
high.

If you are going to compare regression
coefficients then standardize your input
variables to have zero mean and unit variance.
That includes square and crossproduct terms
if you consider them.

Quite often there are several subsets that
yield equivalent performance. I use a priori
knowledge (aka common sense) in choosing among
them.

Hope this helps.
(Also hope no one can prove this is bad advice!)

Greg

From: Rogelio on 5 Aug 2010 11:50

Greg Heath <heath(a)alumni.brown.edu> wrote in message <fc7634bc-5734-4c27-8a2e-bd7ae1d41182(a)q35g2000yqn.googlegroups.com>...
> On Aug 4, 7:21 am, "Anneley " <anneley.mcmillanremove.t...(a)yahoo.com>
> wrote:
> > I am using a generalised linear model for a binomial distribution. I have many potential independent variables to describe the dependent variable (up to 20), however I assume most of these are not going to be useful to the model. I would like to understand how best to go about choosing the best inputs. My first test was to run glmfit with all the parameters, and to take out those parameters with coefficients nearest 0. Is this a good way to start? Also what are the best ways to compare models? I am getting sfit values of around 0.44 and a deviance of fit of around 500.
>
> I'm not a statistician; so take these comments
> with a grain of salt:
>
> In terms of selecting a good subset of input
> variables for a logistic model I have had good
> luck considering the backward search combinations
> obtained from the linear model obtained via
> STEPWISEFIT.
>
> One nice feature is that you can constrain
> the model to keep your favorite inputs.
>
> For context I also
>
> 1. Check out the forward search results.
> 2. Include squares and crossproducts provided
> the original number of inputs is not too
> high.
>
> If you are going to compare regression
> coefficients then standardize your input
> variables to have zero mean and unit variance.
> That includes square and crossproduct terms
> if you consider them.
>
> Quite often there are several subsets that
> yield equivalent performance. I use a priori
> knowledge (aka common sense) in choosing among
> them.
>
> Hope this helps.
> (Also hope no one can prove this is bad advice!)
>
> Greg

What you are asking does not have AN answer. This is matter of modeling and there is many aspects to consider. If the coefficients are close to zero you dont have any good argument to exclude them since they still "drive" the dependent variable. What you should check is if they are statisticaly significant, e.g. t-test. Choosing among models is another big theme. I will suggest the most common information criteria, namely AIC and BIC. You should read what do they do and how to interprete them.

Good luck

|
Pages: 1
Prev: FFT of Impulse response
Next: regular expression