fminunc optimization [Matlab]

Prev: Minimum gradient not reached ! please help
Next: simulation of self excited induction generator?

From: gpanterov Panterov on 10 Apr 2010 14:15

Hi All,
I would be grateful if someone can address the problem I've been struggling with. So far neither Python nor Matlab have been able to help and I am beginning to loose hope....

I am trying to maximize a log-likelihood for a bivariate Poisson distribution at a few hundred points in time and obtain some out-of-sample predictions at each point. The problem has quite a few variables (which I suspect could be part of the problem?).

The Log likelihood is :

L=-sum(-lambda + y1.*log(lambda) - log(factorial(y1)) - mu + y2.*log(mu) - log(factorial(y2)));

Where lambda and mu are the means of the two variables and depend on the parameters of interest.

I have attempted this with fminunc and fminsearch but unsuccessfully so far.
[x,fval,exitflag]=fminunc(@(k) likelihood(k,y1,y2,X1,X2),k0,options);

k is a vector and consists of 72 parameters whose estimates I need to find. y1 and y2 are the two Poisson variables with 3000 elements each.

What is the problem?
Well, whatever estimates I obtain seem to be extremely fragile. For example a simple sorting of the dataset causes a complete reversal of the results. The estimates seem to be extremely sensitive to the sample size as well as to the initial guess. Including the gradient of the likelihood only seems to burden the algorithm and the estimates are often identical to the initial guess. I tried resetting the TolX, TolFun, MaxFunEvals etc.. but to no avail. I often get the error message "Optimization terminated at the initial point" or "Line search cannot find an acceptable point..."

Am I wrong in my approach? Is Matlab ill equipped to handle such a problem? Perhaps I should try with a different solver? Or maybe I could improve the likelihood?

Absolutely any input would be greatly appreciated!
Thanks
George

From: John D'Errico on 10 Apr 2010 15:22

"gpanterov Panterov" <gpanterov(a)gmail.com> wrote in message <hpqf7s$qiv$1(a)fred.mathworks.com>...
> Hi All,
> I would be grateful if someone can address the problem I've been struggling with. So far neither Python nor Matlab have been able to help and I am beginning to loose hope....
>
> I am trying to maximize a log-likelihood for a bivariate Poisson distribution at a few hundred points in time and obtain some out-of-sample predictions at each point. The problem has quite a few variables (which I suspect could be part of the problem?).
>
> The Log likelihood is :
>
> L=-sum(-lambda + y1.*log(lambda) - log(factorial(y1)) - mu + y2.*log(mu) - log(factorial(y2)));
>
> Where lambda and mu are the means of the two variables and depend on the parameters of interest.
>
> I have attempted this with fminunc and fminsearch but unsuccessfully so far.
> [x,fval,exitflag]=fminunc(@(k) likelihood(k,y1,y2,X1,X2),k0,options);
>
> k is a vector and consists of 72 parameters whose estimates I need to find. y1 and y2 are the two Poisson variables with 3000 elements each.
>
> What is the problem?
> Well, whatever estimates I obtain seem to be extremely fragile. For example a simple sorting of the dataset causes a complete reversal of the results. The estimates seem to be extremely sensitive to the sample size as well as to the initial guess. Including the gradient of the likelihood only seems to burden the algorithm and the estimates are often identical to the initial guess. I tried resetting the TolX, TolFun, MaxFunEvals etc.. but to no avail. I often get the error message "Optimization terminated at the initial point" or "Line search cannot find an acceptable point..."
>
> Am I wrong in my approach? Is Matlab ill equipped to handle such a problem? Perhaps I should try with a different solver? Or maybe I could improve the likelihood?
>

I might point out that if you will repeatedly compute the
log of the factorial of the same vectors, each of length
3000 elements, this will make for very slow running code.

Is matlab ill-equipped? No more so than any other
tool on a nasty problem. More importantly, you show
no place where the likelihood function depends on k,
i.e., the parameters that you are optimizing over.

John

From: gpanterov Panterov on 10 Apr 2010 16:00

> >
> > The Log likelihood is :
> >
> > L=-sum(-lambda + y1.*log(lambda) - log(factorial(y1)) - mu + y2.*log(mu) - log(factorial(y2)));
> >
> > Where lambda and mu are the means of the two variables and depend on the parameters of interest.
> >
> > I have attempted this with fminunc and fminsearch but unsuccessfully so far.
> > [x,fval,exitflag]=fminunc(@(k) likelihood(k,y1,y2,X1,X2),k0,options);
> >
> > k is a vector and consists of 72 parameters whose estimates I need to find. y1 and y2 are the two Poisson variables with 3000 elements each.
> >

> I might point out that if you will repeatedly compute the
> log of the factorial of the same vectors, each of length
> 3000 elements, this will make for very slow running code.
>
> Is matlab ill-equipped? No more so than any other
> tool on a nasty problem. More importantly, you show
> no place where the likelihood function depends on k,
> i.e., the parameters that you are optimizing over.
>
> John

I could think about optimizing the log of the factorials. Thanks for the suggestion. Here is how the likelihood depends on the parameters k. The parameters of interest are 30 alphas, 30 betas and one gamma. They define the mean of the Poisson variable (lambda=alpha*beta*gamma and mu=beta*alpha)

gamma=k(1);
alpha=k(2:n+1);
beta=k(n+2:end);

so that:
lambda=(X1*alpha).*(X2*beta)*gamma; and mu=(X1*beta).*(X2*alpha);

Where X1 and X2 are simply boolean matrices selecting elements from the vectors alpha and beta.

and then the likelihood :

L=-sum(-lambda + y1.*log(lambda) - log(factorial(y1)) - mu + y2.*log(mu) - log(factorial(y2)));

(I will fix the factorials--thanks)

I guess this specification poses a possible over parametrization. So I have played with imposing a constraint on sum(alpha)=30. However, using fmincon and constraintes with this dataset and this setup was so disappointing that I quickly gave up.

Any ideas on other improvements in this optimization problem?
Thanks!
George

From: Matt J on 10 Apr 2010 16:22

"gpanterov Panterov" <gpanterov(a)gmail.com> wrote in message <hpqlcl$hd4$1(a)fred.mathworks.com>...

>
> I could think about optimizing the log of the factorials. Thanks for the suggestion.
============

You should omit them altogether. The log factorial terms are just constants that don't
affect the location of the minima.

The parameters of interest are 30 alphas, 30 betas and one gamma. They define the mean of the Poisson variable (lambda=alpha*beta*gamma and mu=beta*alpha)
> so that:
> lambda=(X1*alpha).*(X2*beta)*gamma; and mu=(X1*beta).*(X2*alpha);
>
> Where X1 and X2 are simply boolean matrices selecting elements from the vectors alpha and beta.
==============

You've given two different defintiions of lambda and mu above.

> I guess this specification poses a possible over parametrization. So I have played with imposing a constraint on sum(alpha)=30.

Yes, that would be wise. Without this constraint. it's clear that if mu and alpha are solutions than c*alpha and mu/c are also solutions.

From: gpanterov Panterov on 10 Apr 2010 16:44

>
> You should omit them altogether. The log factorial terms are just constants that don't
> affect the location of the minima.

Thanks. I will drop them!

>
> > lambda=(X1*alpha).*(X2*beta)*gamma; and mu=(X1*beta).*(X2*alpha);

>
> You've given two different defintiions of lambda and mu above.
>

This is on purpose. The means of the two variables differ by the gamma parameter for the same alpha(i) and beta(j). Lambda is formed by alpha(i) *beta(j) *gamma, while simultaneously Mu is formed by beta(i) *alpha(j).

Do you have any idea why the estimates are so fragile? Literally adding 20 observations to a sample of 1000 sometimes forces the algorithm to break down and if it ends up giving some results they are obviously wrong (often it simply returns the initial guess which is drawn from a normal distribution). Is this a problem of dimensionality? Is a vector of 72 parameters too difficult to optimize? Or maybe it is the discrete character of the data (although alpha, beta and gamma are continous...)?

Thanks for all the suggestions!
George

| Next | Last
Pages: 1 2
Prev: Minimum gradient not reached ! please help
Next: simulation of self excited induction generator?