From: Lacoume Arnaud on
Hello,

I have generated pseudo correlated data with the following procedure

Uni=copularnd('t',A,4,100);

I have generated 100 random number in [0,1] correlated with a Student Copula whose parameters are a correlation matrix A (12*12 see at the end) and degrees of liberty 4. After that, I fit a copula on these data

[theta nu]=copulafit('t',Uni);

I recover nu = 4.0512 (that's ok), but theta is strongly different avec my initial matrix A (see at the end).

Does anybody has an idea of the origin of this difference ? Maybe, the optimisation of the log-likehood in dimension 12 is very sensitive ?
Thank you for your help.

Arnaud Lacoume

Correlation matrix :
A is
1 0.5 0.5 0.25 0.5 0.25 0.5 0.25 0.5 0.25 0.25 0.25
0.5 1 0.25 0.25 0.25 0.25 0.5 0.5 0.5 0.25 0.25 0.25
0.5 0.25 1 0.25 0.25 0.25 0.25 0.5 0.5 0.25 0.25 0.5
0.25 0.25 0.25 1 0.25 0.25 0.25 0.5 0.5 0.25 0.25 0.5
0.5 0.25 0.25 0.25 1 0.5 0.5 0.25 0.5 0.5 0.5 0.25
0.25 0.25 0.25 0.25 0.5 1 0.5 0.25 0.5 0.5 0.5 0.25
0.5 0.5 0.25 0.25 0.5 0.5 1 0.25 0.5 0.25 0.5 0.25
0.25 0.5 0.5 0.5 0.25 0.25 0.25 1 0.5 0.5 0.25 0.25
0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1 0.25 0.5 0.5
0.25 0.25 0.25 0.25 0.5 0.5 0.25 0.5 0.25 1 0.25 0.25
0.25 0.25 0.25 0.25 0.5 0.5 0.5 0.25 0.5 0.25 1 0.25
0.25 0.25 0.5 0.5 0.25 0.25 0.25 0.25 0.5 0.25 0.25 1

Theta :
1.00 0.56 0.56 0.42 0.69 0.36 0.60 0.33 0.59 0.42 0.38 0.41
0.56 1.00 0.20 0.27 0.42 0.28 0.59 0.57 0.53 0.37 0.36 0.19
0.56 0.20 1.00 0.38 0.31 0.22 0.29 0.41 0.51 0.21 0.23 0.63
0.42 0.27 0.38 1.00 0.20 0.25 0.28 0.56 0.51 0.30 0.09 0.45
0.69 0.42 0.31 0.20 1.00 0.54 0.59 0.28 0.60 0.50 0.48 0.27
0.36 0.28 0.22 0.25 0.54 1.00 0.52 0.29 0.60 0.54 0.53 0.19
0.60 0.59 0.29 0.28 0.59 0.52 1.00 0.33 0.61 0.29 0.62 0.25
0.33 0.57 0.41 0.56 0.28 0.29 0.33 1.00 0.50 0.54 0.26 0.19
0.59 0.53 0.51 0.51 0.60 0.60 0.61 0.50 1.00 0.29 0.49 0.50
0.42 0.37 0.21 0.30 0.50 0.54 0.29 0.54 0.29 1.00 0.26 0.20
0.38 0.36 0.23 0.09 0.48 0.53 0.62 0.26 0.49 0.26 1.00 0.07
0.41 0.19 0.63 0.45 0.27 0.19 0.25 0.19 0.50 0.20 0.07 1.00
From: Peter Perkins on
On 2/23/2010 11:50 AM, Lacoume Arnaud wrote:
> I have generated pseudo correlated data with the following procedure
> Uni=copularnd('t',A,4,100);
>
> I have generated 100 random number in [0,1] correlated with a Student
> Copula whose parameters are a correlation matrix A (12*12 see at the
> end) and degrees of liberty 4. After that, I fit a copula on these data
> [theta nu]=copulafit('t',Uni);
>
> I recover nu = 4.0512 (that's ok), but theta is strongly different avec
> my initial matrix A (see at the end).

Lacoume, I suspect it's simply that 100 observations is not enough to get a good estimate of the correlation matrix. You're estimating 66 correlation coefficients, after all. If you try this repeatedly with more data, you get estimates that are closer to the known true value (even when using the approximate maximum likelihood method, which is much faster, by the way).

As a simple analogy, try estimating the variance of a univariate normal distribution with 2 observations, then 5, then 10, then 100. This is a statistical problem, not a software problem.

Hope this helps.
From: Lacoume Arnaud on
Peter Perkins <Peter.Perkins(a)MathRemoveThisWorks.com> wrote in message <hm3ii7$r0c$1(a)fred.mathworks.com>...
> On 2/23/2010 11:50 AM, Lacoume Arnaud wrote:
> > I have generated pseudo correlated data with the following procedure
> > Uni=copularnd('t',A,4,100);
> >
> > I have generated 100 random number in [0,1] correlated with a Student
> > Copula whose parameters are a correlation matrix A (12*12 see at the
> > end) and degrees of liberty 4. After that, I fit a copula on these data
> > [theta nu]=copulafit('t',Uni);
> >
> > I recover nu = 4.0512 (that's ok), but theta is strongly different avec
> > my initial matrix A (see at the end).
>
> Lacoume, I suspect it's simply that 100 observations is not enough to get a good estimate of the correlation matrix. You're estimating 66 correlation coefficients, after all. If you try this repeatedly with more data, you get estimates that are closer to the known true value (even when using the approximate maximum likelihood method, which is much faster, by the way).
>
> As a simple analogy, try estimating the variance of a univariate normal distribution with 2 observations, then 5, then 10, then 100. This is a statistical problem, not a software problem.
>
> Hope this helps.


Yes, you are right. I have tested it!
I was not thinking of that, which is the first possible check, just because I think in my head, I was in a context of having 100 data would be very good... This mean that the fit should be done by expert judgment...
Thank you