why mvnrnd works with a singular covariance matrix [Matlab]

Prev: set serial terminator > 127
Next: concatenating data files

From: Farzin Zareian on 11 May 2010 13:09

Hello Matlab users,

I am puzzled to know how matlab generates random vectors chosen from a multivariate normal distribution with a mean vector MU and a singular covariance matrix SIGMA.

I am trying to generate large number (say 2000) of random vectors with a target MU and SIGMA. I develop the MU and SIGMA from available data. Lets assume there are 7 variables, and I only have 3 realizations from measured data. This means that the covariance matrix is not full rank (is singular). I am puzzled to see mvnrnd(MU,SIGMA,2000) still works and generates the random vectors.

I guess there are routines embedded in mvnrnd.m that takes care of the singularity issue.

thanks a bunch for your kind help
farzin

From: Peter Perkins on 11 May 2010 13:49

On 5/11/2010 1:09 PM, Farzin Zareian wrote:
> I am trying to generate large number (say 2000) of random vectors with a
> target MU and SIGMA. I develop the MU and SIGMA from available data.
> Lets assume there are 7 variables, and I only have 3 realizations from
> measured data. This means that the covariance matrix is not full rank
> (is singular). I am puzzled to see mvnrnd(MU,SIGMA,2000) still works and
> generates the random vectors.
>
> I guess there are routines embedded in mvnrnd.m that takes care of the
> singularity issue.

Farzin, there are, and you can look at them simply by opening the
mvnfunction in the editor:

"edit mvnrnd"

The short answer is that all MVNRND need do is find a matrix T such that
T*T' = S, where S is you desired cov matrix. T need not be a Cholesky
factor.

Consider this simple example that starts from the opposite direction (in
a kind of mixed math/MATLAB notation):

x ~ N(0,1)
[y1; y2] = [1; 1]*x
=> [y1; y2] ~ MNV([0; 0], [1 1; 1 1])

and certainly that cov matrix [1 1; 1 1] is singular.

However, the question of whether what you generate using your (probably
very) poor estimate of the covariance is useful, is debatable. Hope
this helps.

From: Roger Stafford on 11 May 2010 16:45

"Farzin Zareian" <zareian(a)uci.edu> wrote in message <hsc2vg$lj8$1(a)fred.mathworks.com>...
> Hello Matlab users,
>
> I am puzzled to know how matlab generates random vectors chosen from a multivariate normal distribution with a mean vector MU and a singular covariance matrix SIGMA.
>
> I am trying to generate large number (say 2000) of random vectors with a target MU and SIGMA. I develop the MU and SIGMA from available data. Lets assume there are 7 variables, and I only have 3 realizations from measured data. This means that the covariance matrix is not full rank (is singular). I am puzzled to see mvnrnd(MU,SIGMA,2000) still works and generates the random vectors.
>
> I guess there are routines embedded in mvnrnd.m that takes care of the singularity issue.
>
> thanks a bunch for your kind help
> farzin

There is nothing inherently wrong with having singular covariance matrices, provided they are based on adequately large quantities of data. Such singularities simply mean that the jointly normal random variables involved are constrained by one or more linear equalities, as in Peter's example where y1-y2 = 0. Any 'mvnrnd' function worth its salt should be able to handle such covariance matrices, regardless of any singularities.

However, from your description it sounds as though your data was very limited and therefore, as Peter indicates, the results are questionable. With three "realizations" I would go further and say the results would be almost meaningless. With seven variables you need a very large number of statistically valid observations in order to determine a reliable covariance matrix.

Roger Stafford

From: Farzin Zareian on 11 May 2010 17:52

Peter and Roger

Many thanks for your instructive responses. After reading your comments and the few other sources I can see your points and now I have better handle on the situation.

As my last question, what is the measure of reliability of the fitted distribution to data. As you mentioned, with three realizations it is meaningless to fit a seven dimensional joint normal distribution. How many realizations is the minimum for getting meaningful results for number of variables equal to N?

many thanks
farzin

From: Roger Stafford on 11 May 2010 22:05

"Farzin Zareian" <zareian(a)uci.edu> wrote in message <hscji4$s04$1(a)fred.mathworks.com>...
> Peter and Roger
>
> Many thanks for your instructive responses. After reading your comments and the few other sources I can see your points and now I have better handle on the situation.
>
> As my last question, what is the measure of reliability of the fitted distribution to data. As you mentioned, with three realizations it is meaningless to fit a seven dimensional joint normal distribution. How many realizations is the minimum for getting meaningful results for number of variables equal to N?
>
> many thanks
> farzin

I was afraid you might ask that question. I can only answer it in a general way.

To begin with, if you have a single random variable, consider the effect of using a number of independent observations to estimate its mean value, and then ask what one can expect the standard deviation to be between this estimate and the true mean. You will find that it is the original random variable's standard deviation divided by the square root of the number of observations. In other words if you make a hundred independent observations, your accuracy of mean estimate only gets better from that of a single observation by a factor of ten. Something similar pertains to sample estimates made of variance values for the random variable. Also with more than one random variable one gets a similar situation in estimating covariance values. It takes a lot of observations to make them accurate.

A second consideration is that as the number of variables increases, the assumption of joint normality becomes increasingly critical. Forget that assumption for a moment and imagine you have a standard two-dimensional chess board representing eight discrete values for each of two random variables. Clearly you will have to have enough samples to put a fairly large number of pairs in each square to make a good estimate of joint probability - a good-sized multiple of 64 at the very least. Now change this to a seven-dimensional chess board corresponding to your seven variables, if you can imagine such a thing. If you want to have several samples in each - what shall we call them - each seven-dimensional "square", you would need a healthy multiple of eight to the power seven, which would be some large multiple of two million samples. It is precisely to avoid such horrific requirements
that statisticians are so eager to assume joint normality in so many situations. It greatly reduces the number of observations that would be necessary in their work.

However, this reasoning should at least strike a note of caution to you that in a seven-variable situation, either you had better be awfully sure of joint normality, or the number of samples should be a good deal more than the above square-root-of-the-number-of-observations criterion would indicate.

Now as you see I have done a lot of hand-waving over this question, but perhaps it will help you a little, even if it is unwelcome information.

(By the way, I too was a graduate student at UCI in years past - in fact at the time they first opened.)

Roger Stafford

| Next | Last
Pages: 1 2
Prev: set serial terminator > 127
Next: concatenating data files