From: Peter Perkins on
On 6/29/2010 7:25 AM, Christoph wrote:

> "area"__"volume"___"color"
> 10_______NaN________red
> 15_______NaN________blue
> NaN______100________yellow
> 12_______NaN________red
> NaN______140________blue

> What I would like to do is to run a regression, that uses the information in "area", disregarding that there is a NaN in "volume" without assigning a zero to that value, to estimate the beta for "area". And the same for the "volume" coeff as well.

I'm going to assume that there are a whole bunch of other vars that you
haven't mentioned, and for which you want to make common estimates
across 2D and 3D. Otherwise I presume you would simply do two regressions.

In theory, this is what you're doing:

Combine area and volume into one variable, call it size. It looks like
[10 15 100 12 140]'. You want a stratified estimate of the size coef,
depending on whether the observation is 2D or 3D. Create a discrete
variable, called dims. It looks like [2 2 3 2 3]'. Create dummy vars
from dims, it looks like

[1 0
1 0
0 1
1 0
0 1]

Now fit a regression using the following design matrix (I'm just
guessing you want an intercept)

[ones(5,1) dummy(:,1).*size dummy(:,2).*size]

Of course, this ends up as

>> X = [ones(5,1) dummy(:,1).*size dummy(:,2).*size]
X =
1 10 0
1 15 0
1 0 100
1 12 0
1 0 140

which is the long way to explain what you already had. The reason for
thinking of it this way is to notice that those zeros are not "made up
areas and volumes", but rather are from the dummy indicator variables.

Now use REGRESS to estimate

[b0 b1_2d b1_3d]'

The model is

y = b0 + b1_2d*area*I{2D} + b1_3d*volume*I{3D}

That's what you're looking for, I think. I guess you're thinking that
you're "making up data by putting in zeros when they aren't really
there", but you aren't. As an analogy, what would you do if you had
measured age for males and females, and wanted separate age coefs,
stratified by gender?

There are other ways to parameterize this, but this one seems the most
straightforward for what you want -- separate estimates of that coef for
2D and 3D.

> But then: even when this is possible, would'nt the coeffs still be biased because the variable "color" could have a different impact on Y, depending on whether the variables "area" or "volume" are included in the regression for each row of Xi?

It sounds like you are asking about some kind of interaction term
between color and other things.
From: Christoph on
Thanks Peter! I guess stratified estimates is just what I was looking for!

only one question left: how do you get from here:

>
> [ones(5,1) dummy(:,1).*size dummy(:,2).*size]
>
> Of course, this ends up as

to here:

> >> X = [ones(5,1) dummy(:,1).*size dummy(:,2).*size]
> X =
> 1 10 0
> 1 15 0
> 1 0 100
> 1 12 0
> 1 0 140

do you mean the same matrix by "dummy" and "size dummy"? because then I dont get how you create a (5,2) matrix from array multiplication of three (5,1) matrices, wouldn't be that [0 0 0 0 0]' then?

hope thats not to stupid...
From: Peter Perkins on
On 6/29/2010 11:04 AM, Christoph wrote:

>> >> X = [ones(5,1) dummy(:,1).*size dummy(:,2).*size]
>> X =
>> 1 10 0
>> 1 15 0
>> 1 0 100
>> 1 12 0
>> 1 0 140
>
> do you mean the same matrix by "dummy" and "size dummy"? because then I dont get how you create a (5,2) matrix from array multiplication of three (5,1) matrices, wouldn't be that [0 0 0 0 0]' then?

Look more carefully at that line of code. It concatenates three column
vectors into a matrix, and two of those columns are themselves created
by elementwise multiplication of two column vectors. All the columns
were previously defined.