PROC NLP and R^2 [SAS]

Prev: 3D Scatter Plot w. 45degree plane
Next: Standard N(0,1) test and PROC CAPABILITY NORMALTEST

From: Mark Miller on 16 Nov 2009 13:59

Sid,

It is not clear to me from your problem statement
exactly what you are trying to accomplish.

Your problem appears to be a regression-like situation
where left-hand side and right-hand side are both variable
(adjustable parameters on either side). Your specification
is missing any information on the range and/or restrictions
on the parameters a, b, & c.

You can estimate this equation as described using
PROC NLIN (rather than PROC NLP) assuming you
supply some info regarding parameter ranges.

However since it is essentially a regression model, you can
also estimate it using PROC REG, the only hitch being that
the model estimated is simply log(col1) = col2
which expands to
log(col2) = param _B (i.e. intercept) + param_C * col2

PROC REG and PROC NLIN generate the same results.
For your specified eft-hand side [ log(col1+a) ]
PROC NLIN drops parameter (a) as not affecting the solution
which is equivalent to a=0

Regression R Square
Col1 on Col2 ; 0.86823999628504
log(Col1) on Col2; 0.9284715257232

col1 on col2
rsquare 0.86823999628504
col1 = -32422.5616652023 + col2 * 0.00875118243314

log(col1) on col2
rsquare= 0.9284715257232
k1 = 9.57036 + k2 * 1.254716E-7

*--- revised problem data for comparing solution methods -- ;
data have;
input col1 col2 ;
logCol1 = log(col1) ;
l
datalines;
30000 6000000
50000 10000000
65000 13000000
70000 13500000
76000 13525000
113000 13925000
115500 15925000
119500 16925000
121500 17425000
122500 17725000
;;
run;

proc nlin data=have
Best= 40
method= Newton hougaard;
parms a = -10 to 10 by 1.0
b = 0 to 10 by 0.00250
c = 0 to 1 by 0.025
;;;

lnCol1_a = log(Col1 + a ) ;
cCol2_b = c * Col2 + b ;

model lnCol1_a = cCol2_b ;
run;

proc reg data=have;
Model logCol1 = Col2;
run;

.... Mark Miller

On Thu, Nov 12, 2009 at 6:03 PM, Sid N <nsid31(a)gmail.com> wrote:

> Hi,
>
> I would like to find the maximum R^2 that can be attained between
> log(column1 + a) and (c * column2 + b) for the below dataset by changing
> the
> values for a, b and c. Previously, I have used the 'Solver' add-in in
> Excel.
>
> data have;
> input column1 column2;
> datalines;
> 30000 6000000
> 50000 10000000
> 65000 13000000
> 70000 13500000
> 76000 13525000
> 113000 13925000
> 115500 15925000
> 119500 16925000
> 121500 17425000
> 122500 17725000
> ;
> run;
>
> Based on what I have read from the SAS-L posts, I understand that PROC NLP
> can help me do the same in SAS. Can someone please assist me with using
> this
> procedure? Thank you for your time.
>
> Sid
>

From: Sid N on 16 Nov 2009 16:24

Thank you Ed and Mark for your inputs.

Yes, I am dealing with a regression-type equation here (log(column1 + a) = c
* column2 + b). Both sides of the equation are variable. The way I do it
with the Solver add-in is by sticking in some random values for a, b and c
and let the Solver work on the equation to maximize the R^2 value. The
initial parameters I chose are purely empirical and there are no definite
ranges or restrictions on these values. However, the initial values I chose
for a, b and c affected the R^2 values.

Here is the approximate range of a, b and c values (sample size > 1000)
using Solver:
A -- max = 2.9E+15; min = -3400000
B -- max = 0.00014; min = 0
C -- max = 15.5; min = -3.7

I ran Mark's code on my 'actual' data and found that the R^2 values are
vastly different (0.98 with Solver vs 0.64 with SAS), which may be
attributed to the parameter values/ranges that are chosen with SAS.

As Ed suggested in his e-mail, is PROC OPTMODEL more appropriate for solving
this type of equation? Is it possible to run PROC NLIN without providing any
range values or without any restrictions on the parameter values? Is
convergence a problem with this type of situation?

Thanks again for your time.

Sid
On Mon, 16 Nov 2009 13:59:52 -0500, Mark Miller <mdhmiller(a)GMAIL.COM> wrote:

>Sid,
>
>It is not clear to me from your problem statement
>exactly what you are trying to accomplish.
>
>Your problem appears to be a regression-like situation
>where left-hand side and right-hand side are both variable
>(adjustable parameters on either side). Your specification
>is missing any information on the range and/or restrictions
>on the parameters a, b, & c.
>
>You can estimate this equation as described using
>PROC NLIN (rather than PROC NLP) assuming you
>supply some info regarding parameter ranges.
>
>However since it is essentially a regression model, you can
>also estimate it using PROC REG, the only hitch being that
>the model estimated is simply log(col1) = col2
>which expands to
> log(col2) = param _B (i.e. intercept) + param_C * col2
>
>PROC REG and PROC NLIN generate the same results.
>For your specified eft-hand side [ log(col1+a) ]
>PROC NLIN drops parameter (a) as not affecting the solution
>which is equivalent to a=0
>
>
>Regression R Square
> Col1 on Col2 ; 0.86823999628504
> log(Col1) on Col2; 0.9284715257232
>
>
>col1 on col2
>rsquare 0.86823999628504
> col1 = -32422.5616652023 + col2 * 0.00875118243314
>
>log(col1) on col2
>rsquare= 0.9284715257232
> k1 = 9.57036 + k2 * 1.254716E-7
>
>*--- revised problem data for comparing solution methods -- ;
>data have;
>input col1 col2 ;
>logCol1 = log(col1) ;
>l
>datalines;
> 30000 6000000
> 50000 10000000
> 65000 13000000
> 70000 13500000
> 76000 13525000
>113000 13925000
>115500 15925000
>119500 16925000
>121500 17425000
>122500 17725000
>;;
>run;
>
>proc nlin data=have
> Best= 40
> method= Newton hougaard;
> parms a = -10 to 10 by 1.0
> b = 0 to 10 by 0.00250
> c = 0 to 1 by 0.025
> ;;;
>
> lnCol1_a = log(Col1 + a ) ;
> cCol2_b = c * Col2 + b ;
>
> model lnCol1_a = cCol2_b ;
>run;
>
>proc reg data=have;
>Model logCol1 = Col2;
>run;
>
>
>... Mark Miller
>
>
>
>On Thu, Nov 12, 2009 at 6:03 PM, Sid N <nsid31(a)gmail.com> wrote:
>
>> Hi,
>>
>> I would like to find the maximum R^2 that can be attained between
>> log(column1 + a) and (c * column2 + b) for the below dataset by changing
>> the
>> values for a, b and c. Previously, I have used the 'Solver' add-in in
>> Excel.
>>
>> data have;
>> input column1 column2;
>> datalines;
>> 30000 6000000
>> 50000 10000000
>> 65000 13000000
>> 70000 13500000
>> 76000 13525000
>> 113000 13925000
>> 115500 15925000
>> 119500 16925000
>> 121500 17425000
>> 122500 17725000
>> ;
>> run;
>>
>> Based on what I have read from the SAS-L posts, I understand that PROC NLP
>> can help me do the same in SAS. Can someone please assist me with using
>> this
>> procedure? Thank you for your time.
>>
>> Sid
>>

From: Dale McLerran on 17 Nov 2009 13:14

Maximization of the correlation between log(x1 + A) and
(C*x2 + B) is really a matter of finding the value of A
which allows for the maximum correlation between log(x1 + A)
and X2. Note that for positive C, the correlation between
(C*X2 + B) and X2 is 1.0 while for negative C, the correlation
is -1.0.

Now, since the correlation between Y and Z is identical for
the regression of Y on Z and the regression of Z on Y, it
doesn't matter if Y is the response and Z the predictor or
the other way around. In the current case, if we have
Y=log(x1 + A) and Z=(C*x2 + B), then it may be preferable
to treat Y as the predictor variable rather than as the
response variable, particularly since

C*x2 + B = log(x1 + A)
=> C*x2 = log(x1 + A) - B
=> x2 = (log(x1 + A) - B)/C

Now, this problem could be solved by PROC NLIN or by the
NLMIXED procedure. PROC NLMIXED code to fit this
regression would be

proc nlmixed data=mydata;
parms A=0.01 B=0 C=1 log_sd=0;
eta = (log(x1 + A) - B)/C;
model x2 ~ normal(eta, exp(2*log_sd));
run;

Having fit this regression, you could construct Y=log(x1 + A)
in a data step and then use PROC CORR to compute the
correlation between Y and x2 which (as noted above) would
be the same as the correlation between Y and C*x2 + B
(except, possibly, for a sign difference).

Dale

---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: dmclerra(a)NO_SPAMfhcrc.org
Ph: (206) 667-2926
Fax: (206) 667-5977
---------------------------------------

--- On Mon, 11/16/09, Sid N <nsid31(a)GMAIL.COM> wrote:

> From: Sid N <nsid31(a)GMAIL.COM>
> Subject: Re: PROC NLP and R^2
> To: SAS-L(a)LISTSERV.UGA.EDU
> Date: Monday, November 16, 2009, 1:24 PM
> Thank you Ed and Mark for your
> inputs.
>
> Yes, I am dealing with a regression-type equation here
> (log(column1 + a) = c
> * column2 + b). Both sides of the equation are variable.
> The way I do it
> with the Solver add-in is by sticking in some random values
> for a, b and c
> and let the Solver work on the equation to maximize the R^2
> value. The
> initial parameters I chose are purely empirical and there
> are no definite
> ranges or restrictions on these values. However, the
> initial values I chose
> for a, b and c affected the R^2 values.
>
> Here is the approximate range of a, b and c values (sample
> size > 1000)
> using Solver:
> A -- max = 2.9E+15; min = -3400000
> B -- max = 0.00014; min = 0
> C -- max = 15.5; min = -3.7
>
> I ran Mark's code on my 'actual' data and found that the
> R^2 values are
> vastly different (0.98 with Solver vs 0.64 with SAS), which
> may be
> attributed to the parameter values/ranges that are chosen
> with SAS.
>
> As Ed suggested in his e-mail, is PROC OPTMODEL more
> appropriate for solving
> this type of equation? Is it possible to run PROC NLIN
> without providing any
> range values or without any restrictions on the parameter
> values? Is
> convergence a problem with this type of situation?
>
> Thanks again for your time.
>
> Sid
> On Mon, 16 Nov 2009 13:59:52 -0500, Mark Miller <mdhmiller(a)GMAIL.COM>
> wrote:
>
> >Sid,
> >
> >It is not clear to me from your problem statement
> >exactly what you are trying to accomplish.
> >
> >Your problem appears to be a
> regression-like situation
> >where left-hand side and right-hand side are both
> variable
> >(adjustable parameters on either side). Your
> specification
> >is missing any information on the range and/or
> restrictions
> >on the parameters a, b, & c.
> >
> >You can estimate this equation as described using
> >PROC NLIN (rather than PROC NLP) assuming you
> >supply some info regarding parameter ranges.
> >
> >However since it is essentially a regression model, you
> can
> >also estimate it using PROC REG, the only hitch being
> that
> >the model estimated is simply log(col1) = col2
> >which expands to
> > log(col2) = param _B (i.e. intercept) +
> param_C * col2
> >
> >PROC REG and PROC NLIN generate the same results.
> >For your specified eft-hand side [
> log(col1+a) ]
> >PROC NLIN drops parameter (a) as not affecting the
> solution
> >which is equivalent to a=0
> >
> >
> >Regression
>
> R Square
> > Col1 on Col2
> ;
> 0.86823999628504
> > log(Col1) on Col2;
> 0.9284715257232
> >
> >
> >col1 on col2
> >rsquare 0.86823999628504
> > col1 = -32422.5616652023 + col2 *
> 0.00875118243314
> >
> >log(col1) on col2
> >rsquare= 0.9284715257232
> > k1 = 9.57036 + k2 *
> 1.254716E-7
> >
> >*--- revised problem data for comparing solution
> methods -- ;
> >data have;
> >input col1 col2 ;
> >logCol1 = log(col1) ;
> >l
> >datalines;
> > 30000 6000000
> > 50000 10000000
> > 65000 13000000
> > 70000 13500000
> > 76000 13525000
> >113000 13925000
> >115500 15925000
> >119500 16925000
> >121500 17425000
> >122500 17725000
> >;;
> >run;
> >
> >proc nlin data=have
> > Best=
> 40
> > method=
> Newton hougaard;
> > parms a = -10 to 10 by
> 1.0
> > b =
> 0 to 10 by 0.00250
> > c =
> 0 to 1 by 0.025
> > ;;;
> >
> > lnCol1_a = log(Col1 + a
> ) ;
> > cCol2_b = c *
> Col2 + b ;
> >
> > model lnCol1_a =
> cCol2_b ;
> >run;
> >
> >proc reg data=have;
> >Model logCol1 = Col2;
> >run;
> >
> >
> >... Mark Miller
> >
> >
> >
> >On Thu, Nov 12, 2009 at 6:03 PM, Sid N <nsid31(a)gmail.com>
> wrote:
> >
> >> Hi,
> >>
> >> I would like to find the maximum R^2 that can be
> attained between
> >> log(column1 + a) and (c * column2 + b) for the
> below dataset by changing
> >> the
> >> values for a, b and c. Previously, I have used the
> 'Solver' add-in in
> >> Excel.
> >>
> >> data have;
> >> input column1 column2;
> >> datalines;
> >> 30000 6000000
> >> 50000 10000000
> >> 65000 13000000
> >> 70000 13500000
> >> 76000 13525000
> >> 113000 13925000
> >> 115500 15925000
> >> 119500 16925000
> >> 121500 17425000
> >> 122500 17725000
> >> ;
> >> run;
> >>
> >> Based on what I have read from the SAS-L posts, I
> understand that PROC NLP
> >> can help me do the same in SAS. Can someone please
> assist me with using
> >> this
> >> procedure? Thank you for your time.
> >>
> >> Sid
> >>
>

From: Sid N on 11 Dec 2009 17:41

Hi,

This is a follow-up question regarding non-linear modeling from my post last
month. Although I have received some useful input at that time, I am still
not able to replicate the results that I am getting with the Excel Solver.

I may have left out some details in my original post (below) which may be
pertinent to the question at hand. The original equation is of the form:

log(column1 + A) = C * column2 + B

Here are the constraints on the paramaters:

A >= -(min(column1))
B = max(log(column1 + A)) - max(column2) * B
C is the slope of the predicted log(column1 + A) values and the original
column2 values.

I would like to know how I could set the initial parameters and bounds on
parameters B and C, which are dependent on other parameter(s) and variables
while using PROC NLIN/PROC NLMIXED/PROC OPTMODEL?

Thank you again in advance.

Sid

On Thu, Nov 12, 2009 at 5:03 PM, Sid N <nsid31(a)gmail.com> wrote:

> Hi,
>
> I would like to find the maximum R^2 that can be attained between
> log(column1 + a) and (c * column2 + b) for the below dataset by changing
> the
> values for a, b and c. Previously, I have used the 'Solver' add-in in
> Excel.
>
> data have;
> input column1 column2;
> datalines;
> 30000 6000000
> 50000 10000000
> 65000 13000000
> 70000 13500000
> 76000 13525000
> 113000 13925000
> 115500 15925000
> 119500 16925000
> 121500 17425000
> 122500 17725000
> ;
> run;
>
> Based on what I have read from the SAS-L posts, I understand that PROC NLP
> can help me do the same in SAS. Can someone please assist me with using
> this
> procedure? Thank you for your time.
>
> Sid
>

From: Sid N on 11 Dec 2009 17:46

Please note this correction on the parameter "B" constraint:
B = max(log(column1 + A)) - max(column2) * C

Sid
On Fri, 11 Dec 2009 16:41:57 -0600, Sid N <nsid31(a)GMAIL.COM> wrote:

>Hi,
>
>This is a follow-up question regarding non-linear modeling from my post last
>month. Although I have received some useful input at that time, I am still
>not able to replicate the results that I am getting with the Excel Solver.
>
>I may have left out some details in my original post (below) which may be
>pertinent to the question at hand. The original equation is of the form:
>
>log(column1 + A) = C * column2 + B
>
>Here are the constraints on the paramaters:
>
>A >= -(min(column1))
>B = max(log(column1 + A)) - max(column2) * B
>C is the slope of the predicted log(column1 + A) values and the original
>column2 values.
>
>I would like to know how I could set the initial parameters and bounds on
>parameters B and C, which are dependent on other parameter(s) and variables
>while using PROC NLIN/PROC NLMIXED/PROC OPTMODEL?
>
>Thank you again in advance.
>
>Sid
>
>On Thu, Nov 12, 2009 at 5:03 PM, Sid N <nsid31(a)gmail.com> wrote:
>
>> Hi,
>>
>> I would like to find the maximum R^2 that can be attained between
>> log(column1 + a) and (c * column2 + b) for the below dataset by changing
>> the
>> values for a, b and c. Previously, I have used the 'Solver' add-in in
>> Excel.
>>
>> data have;
>> input column1 column2;
>> datalines;
>> 30000 6000000
>> 50000 10000000
>> 65000 13000000
>> 70000 13500000
>> 76000 13525000
>> 113000 13925000
>> 115500 15925000
>> 119500 16925000
>> 121500 17425000
>> 122500 17725000
>> ;
>> run;
>>
>> Based on what I have read from the SAS-L posts, I understand that PROC NLP
>> can help me do the same in SAS. Can someone please assist me with using
>> this
>> procedure? Thank you for your time.
>>
>> Sid
>>

| Next | Last
Pages: 1 2
Prev: 3D Scatter Plot w. 45degree plane
Next: Standard N(0,1) test and PROC CAPABILITY NORMALTEST