From: Nathan on
I too have run across the need for a nonlinear aoctool, but I think this thread has confused me more than helped me. If I propose a simplified example, I hope that someone can humor me with a simple MATLAB solution.

Imagine the child growth charts found in every pediatricians office. They are fitted curves based off of the weight vs age data from hundreds of children. If you split the sexes, you would have a "boys" curve, and a "girls" curve. How would you state that the boys growth rate is significantly different than the girls growth rate?

I think I grasp how I can test for differences in the different terms of the model (like in slope and intercept for linear fits). But what if not all the terms are significantly different? And what if the term that is statistically different translates to no real physiological difference? Is there a single statistical test to compare all aspects of two different curves?
From: Tom Lane on
> Imagine the child growth charts found in every pediatricians office. They
> are fitted curves based off of the weight vs age data from hundreds of
> children. If you split the sexes, you would have a "boys" curve, and a
> "girls" curve. How would you state that the boys growth rate is
> significantly different than the girls growth rate?

I don't have data on that, but at the bottom of this post I've included a
different example. Using the "carsmall" data found in the Statistics
Toolbox, I fit a model of car acceleration as a function of weight. I test
whether the estimated coefficients are the same or different across model
years.

> I think I grasp how I can test for differences in the different terms of
> the model (like in slope and intercept for linear fits). But what if not
> all the terms are significantly different? And what if the term that is
> statistically different translates to no real physiological difference?
> Is there a single statistical test to compare all aspects of two different
> curves?

There are two issues here. One is if a difference that is statistically
significant (measurable) is important in parctice. Only you, as a subject
matter expert, can answer that. The other, I think, is how to test multiple
coefficients in a model. My examples test pairs of coefficients. You could
test more if you wanted.

Just for simplicity I've actually used linear models, but the principle is
the same.

-- Tom

load carsmall
x = [Weight/1000 Model_Year];
y = Acceleration;

%% Fit same model for all three model years
f = @(b,x) b(1) + b(2)*x(:,1);
[b,r1,~,covb,mse] = nlinfit(x,y,f,[1 1]);

se = sqrt(diag(covb)); % display table of coef, std error, t stat, p val
t = b(:)./se;
p = 2*tcdf(-abs(t), length(y)-length(b));
tbl = [b(:), se, b(:)./se, p];
dataset({tbl, 'Coeff' 'StdErr' 't' 'p'})

%% Fit model where intercept varies by model year
f = @(b,x) b(1) + b(2)*x(:,1) + b(3)*(x(:,2)==76) + b(4)*(x(:,2)==82);
[b,r2,~,covb,mse] = nlinfit(x,y,f,[1 1 0 0]);

se = sqrt(diag(covb));
t = b(:)./se;
dfe = length(y)-length(b);
p = 2*tcdf(-abs(t), dfe);
tbl = [b(:), se, b(:)./se, p];
dataset({tbl, 'Coeff' 'StdErr' 't' 'p'})

%% Test two ways for same intercept vs different intercepts
p = linhyptest(b,covb,[0;0],[0 0 1 0;0 0 0 1],dfe)

ms1 = (sum(r1.^2)-sum(r2.^2))/2;
p = 1 - fcdf(ms1/mse,2,dfe)

%% Fit model where slope also varies by model year
f = @(b,x) b(1) + b(2)*x(:,1) + b(3)*(x(:,2)==76) + b(4)*(x(:,2)==82) ...
+ b(5)*x(:,1).*(x(:,2)==76) +
b(6)*x(:,1).*(x(:,2)==82);
[b,r3,~,covb,mse] = nlinfit(x,y,f,[1 1 0 0 0 0]);

se = sqrt(diag(covb));
t = b(:)./se;
dfe = length(y)-length(b);
p = 2*tcdf(-abs(t), dfe);
tbl = [b(:), se, b(:)./se, p];
dataset({tbl, 'Coeff' 'StdErr' 't' 'p'})

%% Test two ways for same slope vs different slopes
p = linhyptest(b,covb,[0;0],[0 0 0 0 1 0;0 0 0 0 0 1],dfe)

ms1 = (sum(r2.^2)-sum(r3.^2))/2;
p = 1 - fcdf(ms1/mse,2,dfe)


From: Nathan on
I often feel silly when re-reading my posts a few days later. This time it is because my solution was in the preceding comments all along and I just didn't know it.

Tom's suggestion to "compare the sum of squared residuals from the combined fit to the sum of the two sums of squared residuals from the separate fits" is exactly what I'm looking for.

But my new stumbling block is that I'm getting p-values of zero. Everyone likes low p-values, but a zero p-value reeks of an error someplace.

My specific line reads p=1-fcdf(1.0041e+004,2,71)

Are there constraints or limitations with the fcdf function that I'm not aware of?
From: Walter Roberson on
Nathan wrote:

> But my new stumbling block is that I'm getting p-values of zero.
> Everyone likes low p-values, but a zero p-value reeks of an error
> someplace.
>
> My specific line reads p=1-fcdf(1.0041e+004,2,71)
>
> Are there constraints or limitations with the fcdf function that I'm not
> aware of?

If I have found the correct functions to try in Maple, fcdf with those
parameters is 1 - 8.4E-88 which is far below the numerical accuracy of binary
floating point.


Maple> with(Statistics): evalf(1-CDF(FRatio(2,71),10041));

.8234018829e-87
From: Steven Lord on

"Nathan " <ndn3(a)georgetown.edu.remove.this> wrote in message
news:hurh9s$q98$1(a)fred.mathworks.com...
>I often feel silly when re-reading my posts a few days later. This time it
>is because my solution was in the preceding comments all along and I just
>didn't know it.
>
> Tom's suggestion to "compare the sum of squared residuals from the
> combined fit to the sum of the two sums of squared residuals from the
> separate fits" is exactly what I'm looking for.
>
> But my new stumbling block is that I'm getting p-values of zero. Everyone
> likes low p-values, but a zero p-value reeks of an error someplace.
>
> My specific line reads p=1-fcdf(1.0041e+004,2,71)
>
> Are there constraints or limitations with the fcdf function that I'm not
> aware of?

Let's evaluate the F distribution CDF symbolically, using the expression
given on the reference page for FCDF.

http://www.mathworks.com/access/helpdesk/help/toolbox/stats/fcdf.html


%% Define x and t to be symbolic
syms x t

%% Define v1 and v2 to be symbolic as well to avoid loss of precision
v1 = sym(2);
v2 = sym(71);

%% Evaluate the first term in the integrand
term1 = (gamma((v1+v2)/2)*(v1/v2)^(v1/2))/(gamma(v1/2)*gamma(v2/2))

%% Evaluate the second term in the integrand
term2 = t^((v1-2)/2)/((1+(v1/v2)*t)^((v1+v2)/2))

%% Integrate
fcdfValue = int(term1*term2, t, 0, x)

%% Since you wanted 1-fcdf(...) we subtract 1
oneMinus = 1-fcdfValue

%% Let's substitute in the exact value
xvalue = sym(10041);
exactValue = subs(oneMinus, x, xvalue)
vpa(exactValue)


Evaluating the CDF symbolically indicates that the result is something on
the order of 1e-87. But that was performing all the steps symbolically --
numerically, while 1e-87 is not exactly 0, it's not very far off.

Now are all numbers representable in double precision arithmetic? No, they
are not. In particular, the next representable number before 1 is
(1-eps/2). Since eps is about 2.22e-16, eps/2 is about 1.11e-16. Your fcdf
value of 1-1e-87 (give or take) can't be exactly represented and so is
stored as the closest representable number -- which is exactly 1. Then when
you subtract an exact 1 from 1, you get an exact 0.

--
Steve Lord
slord(a)mathworks.com
comp.soft-sys.matlab (CSSM) FAQ: http://matlabwiki.mathworks.com/MATLAB_FAQ
To contact Technical Support use the Contact Us link on
http://www.mathworks.com