From: condor on
I have an easy question...

n = number of observations
p = number of terms included in the final model

I performed 2 stepwisefit regression:
a) between 20,000 Y and 10,000 possibile regressors with n=5 (5 years); the regressors considered are always the same for every Y
b) between 20,000 Y and 10,000 possibile regressors with n=30 (30 years); the regressors considered are always the same for every Y

In a) the number of terms included is NEVER> 3; in b) the number of terms included is NEVER> 24. The R squared is always very high and there isn't multicollinearity.

Since the number of coefficients included is a p-by-1 vector (so I could have in a) 5 terms included and in b) I could have 30 terms included) I have two questions:

1) Is there something strange?
2) Is there a relations (in Matlab) between the number of terms included and the number of observations? Or is it just a coincidence?
THANKS
From: condor on
"condor " <brunoricola(a)libero.it> wrote in message <hu23e4$69k$1(a)fred.mathworks.com>...
> I have an easy question...
>
> n = number of observations
> p = number of terms included in the final model
>
> I performed 2 kind of stepwisefit regression (for a total of 40,000 regressions):
> a) between 20,000 Y and 10,000 possibile regressors with n=5 (5 years); the regressors considered are always the same for every Y
> b) between 20,000 Y and 10,000 possibile regressors with n=30 (30 years); the regressors considered are always the same for every Y
>
> In a) the number of terms included is NEVER> 3; in b) the number of terms included is NEVER> 24. The R squared is always very high and there isn't multicollinearity.
>
> Since the number of coefficients included is a p-by-1 vector (so I could have in a) 5 terms included and in b) I could have 30 terms included) I have two questions:
>
> 1) Is there something strange?
> 2) Is there a relations (in Matlab) between the number of terms included and the number of observations? Or is it just a coincidence?
> THANKS
From: Tom Lane on
> I performed 2 stepwisefit regression:
> a) between 20,000 Y and 10,000 possibile regressors with n=5 (5 years);
> the regressors considered are always the same for every Y b) between
> 20,000 Y and 10,000 possibile regressors with n=30 (30 years); the
> regressors considered are always the same for every Y
> In a) the number of terms included is NEVER> 3; in b) the number of terms
> included is NEVER> 24. The R squared is always very high and there isn't
> multicollinearity.
>
> Since the number of coefficients included is a p-by-1 vector (so I could
> have in a) 5 terms included and in b) I could have 30 terms included) I
> have two questions:

If I understand you correctly, you have 5 observations on many variables.
You also have an implied constant term. So 4 predictors would lead to a
saturated model with no ability to estimate error. When stepwisefit has 3
predictors, it's not possible to compute the significance of a 4th one, so
it would always stop short of that.

I can't think of any reason why 24 ought to be a strict upper limit on the
number of predictors when you have 30 observations.

-- Tom


From: condor on
"Tom Lane" <tlaneATmathworksDOTcom(a)nospam.com> wrote in message <hu34or$8h9$1(a)fred.mathworks.com>...
> > I performed 2 stepwisefit regression:
> > a) between 20,000 Y and 10,000 possibile regressors with n=5 (5 years);
> > the regressors considered are always the same for every Y b) between
> > 20,000 Y and 10,000 possibile regressors with n=30 (30 years); the
> > regressors considered are always the same for every Y
> > In a) the number of terms included is NEVER> 3; in b) the number of terms
> > included is NEVER> 24. The R squared is always very high and there isn't
> > multicollinearity.
> >
> > Since the number of coefficients included is a p-by-1 vector (so I could
> > have in a) 5 terms included and in b) I could have 30 terms included) I
> > have two questions:
>
> If I understand you correctly, you have 5 observations on many variables.
> You also have an implied constant term. So 4 predictors would lead to a
> saturated model with no ability to estimate error. When stepwisefit has 3
> predictors, it's not possible to compute the significance of a 4th one, so
> it would always stop short of that.
>
> I can't think of any reason why 24 ought to be a strict upper limit on the
> number of predictors when you have 30 observations.
>
> -- Tom
>

Yes I have 5 observations on many variables and stepwisefiton Matlab automatically includes a constant term. When you say:
> When stepwisefit has 3 predictors, it's not possible to compute the significance of a 4th one >
I don't understand why... Because of 5 observations I should have a maximum number of predictors of 4 + the constant term, isn't it?
As written above, the r-squared is almost alway near to 1 (from 0.995 to 1). The number of predictors used vary from 1 to 3 (+constat term), what is strange is that on 20,000 regressions I did, I haven't found one with 4 predictors (+constat term)?
From: Tom Lane on
>> When stepwisefit has 3 predictors, it's not possible to compute the
>> significance of a 4th one >
> I don't understand why... Because of 5 observations I should have a
> maximum number of predictors of 4 + the constant term, isn't it?
> As written above, the r-squared is almost alway near to 1 (from 0.995 to
> 1). The number of predictors used vary from 1 to 3 (+constat term), what
> is strange is that on 20,000 regressions I did, I haven't found one with 4
> predictors (+constat term)?

In the process of deciding whether to add a predictor, we (the stepwisefit
function) computes F statistics that compare:

1. The reduction in error sum of squares that results from adding the
predictor.

2. The remaining error sum of squares after adding the predictor.

If adding a new predictor will saturate the model, we're guaranteed that #1
is 100% of the error sum of squares, and #2 is zero. So we can't
meaningfully test one against the other.

-- Tom