Multiple regression-equation [Matlab]

Prev: Read data set metadata from HDF4 files?
Next: parfor with sedumi or yalmip

From: Greg Heath on 29 Mar 2010 18:53

On Mar 29, 4:19 pm, rams <lrams...(a)gmail.com> wrote:
> Hi Walter roberson,
>
> I don't have the model but when i plot dependent variable individually with each of independent variables then i can fit them with 3rd order polynomial.Will that be helpful in finding the multiple regression equation....thanks in advance....

There is no "THE" model.
1.You propose a model based on prior information or plain ignorance
2. Quantify the goodness of fit e.g., mean-square-error
3. Either accept the result, or propose another model and go to 1.

Since you know that good single variable third order fits
are reasonable, you could try linear, quadratic and
third order models using all of the variables.

However, the number of coefficients for the each model is
Linear: 1 + 5 = 6
Quadratic: 6 + 5^2 = 31

Cubic: 31 + 5^3 = 156

As a rule of thumb, you would like at least 10 times as
many data points as coefficients to estimate. Therefore if the
quadratic fit (See my thread "Vectorization for Quadratic
Polynomial Regression") is not satisfactory, you might
consider a neural network.

Hope this helps.

Greg
many data points as coefficients to estimate.

From: the cyclist on 30 Mar 2010 09:29

Greg Heath <heath(a)alumni.brown.edu> wrote in message <c5b110b4-776a-4c56-94ae-93b38e9e1b4a(a)k24g2000pro.googlegroups.com>...
> On Mar 29, 4:19 pm, rams <lrams...(a)gmail.com> wrote:
> > Hi Walter roberson,
> >
> > I don't have the model but when i plot dependent variable individually with each of independent variables then i can fit them with 3rd order polynomial.Will that be helpful in finding the multiple regression equation....thanks in advance....
>
> There is no "THE" model.
> 1.You propose a model based on prior information or plain ignorance
> 2. Quantify the goodness of fit e.g., mean-square-error
> 3. Either accept the result, or propose another model and go to 1.
>
> Since you know that good single variable third order fits
> are reasonable, you could try linear, quadratic and
> third order models using all of the variables.
>
> However, the number of coefficients for the each model is
> Linear: 1 + 5 = 6
> Quadratic: 6 + 5^2 = 31
>
> Cubic: 31 + 5^3 = 156
>
> As a rule of thumb, you would like at least 10 times as
> many data points as coefficients to estimate. Therefore if the
> quadratic fit (See my thread "Vectorization for Quadratic
> Polynomial Regression") is not satisfactory, you might
> consider a neural network.
>
> Hope this helps.
>
> Greg
> many data points as coefficients to estimate.

rams,

Floating around this discussion, but not being stated outright, is that you should be aware of the "parsimony" of your model, and the dangers of overfitting. Even if you have the ~1560 data points that Greg suggests you would need to fit third-order polynomials in all combinations of your 5 independent variables, that does NOT mean that that is a good model. In general, more and more parameters lead to a better fit, but at some point you are fitting the random noise, which does you no good. (That is overfitting.)

I'm not an expert on these things, but I know there are tests that one can apply to models to assess these things. Probably searching on some of the keywords in this thread will help you. You might also try telling us a bit more about what you are trying to conceptually with your data, not just the math.

the cyclist

From: rams on 30 Mar 2010 11:15

I have reflectance data modeled using 5 independent variables. I also have measured reflectance data that i collected in the field. Now i want fit modeled reflectance with measured reflectance by adjusting those 5 independent variables so that i might estimate those 5 parameters for measured data.

From: Greg Heath on 31 Mar 2010 04:49

On Mar 30, 9:29 am, "the cyclist" <thecycl...(a)gmail.com> wrote:
> Greg Heath <he...(a)alumni.brown.edu> wrote in message <c5b110b4-776a-4c56-94ae-93b38e9e1...(a)k24g2000pro.googlegroups.com>...
> > On Mar 29, 4:19 pm, rams <lrams...(a)gmail.com> wrote:
> > > Hi Walter roberson,
>
> > > I don't have the model but when i plot dependent variable
individually with each of independent variables then i can fit
them with 3rd order polynomial.Will that be helpful in finding
> > > the multiple regression equation....thanks in advance....
>
> > There is no "THE" model.
> > 1.You propose a model based on prior information or plain ignorance
> > 2. Quantify the goodness of fit e.g., mean-square-error
> > 3. Either accept the result, or propose another model and go to 1.

Correction: go to 2

> > Since you know that good single variable third order fits
> > are reasonable, you could try linear, quadratic and
> > third order models using all of the variables.
>
> > However, the number of coefficients for the each model is
> > Linear: 1 + 5 = 6
> > Quadratic: 6 + 5^2 = 31
>
> > Cubic: 31 + 5^3 = 156
>
> > As a rule of thumb, you would like at least 10 times as
> > many data points as coefficients to estimate. Therefore if the
> > quadratic fit (See my thread "Vectorization for Quadratic
> > Polynomial Regression") is not satisfactory, you might
> > consider a neural network.
>
> Floating around this discussion, but not being stated outright,
> is that you should be aware of the "parsimony" of your model,
> and the dangers of overfitting. Even if you have the ~1560 data
> points that Greg suggests you would need to fit third-order
> polynomials in all combinations of your 5 independent variables,

No, you don't have to do this. Very often You can use a
stagewise search and obtain a nonoptimal model that is
insignificantly worse than an optimal model.

> that does NOT mean that that is a good model.

Very true.

Nor does it mean that a poor model will automatically
result if you only have 1.5*156 = 234 data points.

If Nw is the number of weights and thresholds to be
estimated and Ntrn is the number of training vectors,
the influence of measurement errors and inadequate
data sampling causes the estimates to be less useful
as the ratio r = Ntrn/Nw decreases.

Typically, if r <= 1 (Ntrn <= Nw) solutions exists.
However, those solutions will include the effects
of design data measurement errors and inadequate
sampling. Therefore the resulting weights may be
useless for nondesign data.

Typically, if r > 1 (Ntrn > Nw )zero error solutions
don't exist. However, least-square-error and other
approximate solutions are available. Moreover,
their accuracy tends to increase as r increases
because the Ntrn-Nw extra degrees of freedom
allows training algoriths to average out weight
estimate errors caused by the sparseness of random
sampling and the existence of random measurement
error.

Often a search is made to determine a lower bound
for a good choice of r.

When good results are obtained, I generally find that
~2 <= r <= ~ 32. Therefore, I tend to begin my
search with r ~ sqrt(2*32) ~ 8 or 10.

>In general, more and more parameters lead to a better fit,
>but at some point you are fitting the random noise, which
>does you no good. (That is overfitting.)

Overfitting means using more parametrs than are
necessary. However, there are various ways to mitigate
overfitting (e.g., Regularization and Stopped Training).
If iterative training is not stopped just past the point
where a validation nontraining design set achieves a
minimum in mean-squared-error or other stopping criterion,
the model is said to be overtrained.

Bottom Line:
Overfitting can be mitigated
Overtraining an overfit model should be avoided.

Hope this helps.

Greg

From: Greg Heath on 31 Mar 2010 04:53

On Mar 30, 3:15 pm, rams <lrams...(a)gmail.com> wrote:
> I have reflectance data modeled using 5 independent variables. I also have measured reflectance data that i collected in the field. Now i want fit modeled reflectance with measured reflectance by adjusting those 5 independent variables so that i might estimate those 5 parameters for measured data.

In general, this appears to be dang near impossible.

How do you propose to do this?

Greg

First | Prev | Next | Last
Pages: 1 2 3
Prev: Read data set metadata from HDF4 files?
Next: parfor with sedumi or yalmip