From: Mark Miller on 16 Nov 2009 13:59 Sid, It is not clear to me from your problem statement exactly what you are trying to accomplish. Your problem appears to be a regression-like situation where left-hand side and right-hand side are both variable (adjustable parameters on either side). Your specification is missing any information on the range and/or restrictions on the parameters a, b, & c. You can estimate this equation as described using PROC NLIN (rather than PROC NLP) assuming you supply some info regarding parameter ranges. However since it is essentially a regression model, you can also estimate it using PROC REG, the only hitch being that the model estimated is simply log(col1) = col2 which expands to log(col2) = param _B (i.e. intercept) + param_C * col2 PROC REG and PROC NLIN generate the same results. For your specified eft-hand side [ log(col1+a) ] PROC NLIN drops parameter (a) as not affecting the solution which is equivalent to a=0 Regression R Square Col1 on Col2 ; 0.86823999628504 log(Col1) on Col2; 0.9284715257232 col1 on col2 rsquare 0.86823999628504 col1 = -32422.5616652023 + col2 * 0.00875118243314 log(col1) on col2 rsquare= 0.9284715257232 k1 = 9.57036 + k2 * 1.254716E-7 *--- revised problem data for comparing solution methods -- ; data have; input col1 col2 ; logCol1 = log(col1) ; l datalines; 30000 6000000 50000 10000000 65000 13000000 70000 13500000 76000 13525000 113000 13925000 115500 15925000 119500 16925000 121500 17425000 122500 17725000 ;; run; proc nlin data=have Best= 40 method= Newton hougaard; parms a = -10 to 10 by 1.0 b = 0 to 10 by 0.00250 c = 0 to 1 by 0.025 ;;; lnCol1_a = log(Col1 + a ) ; cCol2_b = c * Col2 + b ; model lnCol1_a = cCol2_b ; run; proc reg data=have; Model logCol1 = Col2; run; .... Mark Miller On Thu, Nov 12, 2009 at 6:03 PM, Sid N <nsid31(a)gmail.com> wrote: > Hi, > > I would like to find the maximum R^2 that can be attained between > log(column1 + a) and (c * column2 + b) for the below dataset by changing > the > values for a, b and c. Previously, I have used the 'Solver' add-in in > Excel. > > data have; > input column1 column2; > datalines; > 30000 6000000 > 50000 10000000 > 65000 13000000 > 70000 13500000 > 76000 13525000 > 113000 13925000 > 115500 15925000 > 119500 16925000 > 121500 17425000 > 122500 17725000 > ; > run; > > Based on what I have read from the SAS-L posts, I understand that PROC NLP > can help me do the same in SAS. Can someone please assist me with using > this > procedure? Thank you for your time. > > Sid >
From: Sid N on 16 Nov 2009 16:24 Thank you Ed and Mark for your inputs. Yes, I am dealing with a regression-type equation here (log(column1 + a) = c * column2 + b). Both sides of the equation are variable. The way I do it with the Solver add-in is by sticking in some random values for a, b and c and let the Solver work on the equation to maximize the R^2 value. The initial parameters I chose are purely empirical and there are no definite ranges or restrictions on these values. However, the initial values I chose for a, b and c affected the R^2 values. Here is the approximate range of a, b and c values (sample size > 1000) using Solver: A -- max = 2.9E+15; min = -3400000 B -- max = 0.00014; min = 0 C -- max = 15.5; min = -3.7 I ran Mark's code on my 'actual' data and found that the R^2 values are vastly different (0.98 with Solver vs 0.64 with SAS), which may be attributed to the parameter values/ranges that are chosen with SAS. As Ed suggested in his e-mail, is PROC OPTMODEL more appropriate for solving this type of equation? Is it possible to run PROC NLIN without providing any range values or without any restrictions on the parameter values? Is convergence a problem with this type of situation? Thanks again for your time. Sid On Mon, 16 Nov 2009 13:59:52 -0500, Mark Miller <mdhmiller(a)GMAIL.COM> wrote: >Sid, > >It is not clear to me from your problem statement >exactly what you are trying to accomplish. > >Your problem appears to be a regression-like situation >where left-hand side and right-hand side are both variable >(adjustable parameters on either side). Your specification >is missing any information on the range and/or restrictions >on the parameters a, b, & c. > >You can estimate this equation as described using >PROC NLIN (rather than PROC NLP) assuming you >supply some info regarding parameter ranges. > >However since it is essentially a regression model, you can >also estimate it using PROC REG, the only hitch being that >the model estimated is simply log(col1) = col2 >which expands to > log(col2) = param _B (i.e. intercept) + param_C * col2 > >PROC REG and PROC NLIN generate the same results. >For your specified eft-hand side [ log(col1+a) ] >PROC NLIN drops parameter (a) as not affecting the solution >which is equivalent to a=0 > > >Regression R Square > Col1 on Col2 ; 0.86823999628504 > log(Col1) on Col2; 0.9284715257232 > > >col1 on col2 >rsquare 0.86823999628504 > col1 = -32422.5616652023 + col2 * 0.00875118243314 > >log(col1) on col2 >rsquare= 0.9284715257232 > k1 = 9.57036 + k2 * 1.254716E-7 > >*--- revised problem data for comparing solution methods -- ; >data have; >input col1 col2 ; >logCol1 = log(col1) ; >l >datalines; > 30000 6000000 > 50000 10000000 > 65000 13000000 > 70000 13500000 > 76000 13525000 >113000 13925000 >115500 15925000 >119500 16925000 >121500 17425000 >122500 17725000 >;; >run; > >proc nlin data=have > Best= 40 > method= Newton hougaard; > parms a = -10 to 10 by 1.0 > b = 0 to 10 by 0.00250 > c = 0 to 1 by 0.025 > ;;; > > lnCol1_a = log(Col1 + a ) ; > cCol2_b = c * Col2 + b ; > > model lnCol1_a = cCol2_b ; >run; > >proc reg data=have; >Model logCol1 = Col2; >run; > > >... Mark Miller > > > >On Thu, Nov 12, 2009 at 6:03 PM, Sid N <nsid31(a)gmail.com> wrote: > >> Hi, >> >> I would like to find the maximum R^2 that can be attained between >> log(column1 + a) and (c * column2 + b) for the below dataset by changing >> the >> values for a, b and c. Previously, I have used the 'Solver' add-in in >> Excel. >> >> data have; >> input column1 column2; >> datalines; >> 30000 6000000 >> 50000 10000000 >> 65000 13000000 >> 70000 13500000 >> 76000 13525000 >> 113000 13925000 >> 115500 15925000 >> 119500 16925000 >> 121500 17425000 >> 122500 17725000 >> ; >> run; >> >> Based on what I have read from the SAS-L posts, I understand that PROC NLP >> can help me do the same in SAS. Can someone please assist me with using >> this >> procedure? Thank you for your time. >> >> Sid >>
From: Dale McLerran on 17 Nov 2009 13:14 Maximization of the correlation between log(x1 + A) and (C*x2 + B) is really a matter of finding the value of A which allows for the maximum correlation between log(x1 + A) and X2. Note that for positive C, the correlation between (C*X2 + B) and X2 is 1.0 while for negative C, the correlation is -1.0. Now, since the correlation between Y and Z is identical for the regression of Y on Z and the regression of Z on Y, it doesn't matter if Y is the response and Z the predictor or the other way around. In the current case, if we have Y=log(x1 + A) and Z=(C*x2 + B), then it may be preferable to treat Y as the predictor variable rather than as the response variable, particularly since C*x2 + B = log(x1 + A) => C*x2 = log(x1 + A) - B => x2 = (log(x1 + A) - B)/C Now, this problem could be solved by PROC NLIN or by the NLMIXED procedure. PROC NLMIXED code to fit this regression would be proc nlmixed data=mydata; parms A=0.01 B=0 C=1 log_sd=0; eta = (log(x1 + A) - B)/C; model x2 ~ normal(eta, exp(2*log_sd)); run; Having fit this regression, you could construct Y=log(x1 + A) in a data step and then use PROC CORR to compute the correlation between Y and x2 which (as noted above) would be the same as the correlation between Y and C*x2 + B (except, possibly, for a sign difference). Dale --------------------------------------- Dale McLerran Fred Hutchinson Cancer Research Center mailto: dmclerra(a)NO_SPAMfhcrc.org Ph: (206) 667-2926 Fax: (206) 667-5977 --------------------------------------- --- On Mon, 11/16/09, Sid N <nsid31(a)GMAIL.COM> wrote: > From: Sid N <nsid31(a)GMAIL.COM> > Subject: Re: PROC NLP and R^2 > To: SAS-L(a)LISTSERV.UGA.EDU > Date: Monday, November 16, 2009, 1:24 PM > Thank you Ed and Mark for your > inputs. > > Yes, I am dealing with a regression-type equation here > (log(column1 + a) = c > * column2 + b). Both sides of the equation are variable. > The way I do it > with the Solver add-in is by sticking in some random values > for a, b and c > and let the Solver work on the equation to maximize the R^2 > value. The > initial parameters I chose are purely empirical and there > are no definite > ranges or restrictions on these values. However, the > initial values I chose > for a, b and c affected the R^2 values. > > Here is the approximate range of a, b and c values (sample > size > 1000) > using Solver: > A -- max = 2.9E+15; min = -3400000 > B -- max = 0.00014; min = 0 > C -- max = 15.5; min = -3.7 > > I ran Mark's code on my 'actual' data and found that the > R^2 values are > vastly different (0.98 with Solver vs 0.64 with SAS), which > may be > attributed to the parameter values/ranges that are chosen > with SAS. > > As Ed suggested in his e-mail, is PROC OPTMODEL more > appropriate for solving > this type of equation? Is it possible to run PROC NLIN > without providing any > range values or without any restrictions on the parameter > values? Is > convergence a problem with this type of situation? > > Thanks again for your time. > > Sid > On Mon, 16 Nov 2009 13:59:52 -0500, Mark Miller <mdhmiller(a)GMAIL.COM> > wrote: > > >Sid, > > > >It is not clear to me from your problem statement > >exactly what you are trying to accomplish. > > > >Your problem appears to be a > regression-like situation > >where left-hand side and right-hand side are both > variable > >(adjustable parameters on either side). Your > specification > >is missing any information on the range and/or > restrictions > >on the parameters a, b, & c. > > > >You can estimate this equation as described using > >PROC NLIN (rather than PROC NLP) assuming you > >supply some info regarding parameter ranges. > > > >However since it is essentially a regression model, you > can > >also estimate it using PROC REG, the only hitch being > that > >the model estimated is simply log(col1) = col2 > >which expands to > > log(col2) = param _B (i.e. intercept) + > param_C * col2 > > > >PROC REG and PROC NLIN generate the same results. > >For your specified eft-hand side [ > log(col1+a) ] > >PROC NLIN drops parameter (a) as not affecting the > solution > >which is equivalent to a=0 > > > > > >Regression > > R Square > > Col1 on Col2 > ; > 0.86823999628504 > > log(Col1) on Col2; > 0.9284715257232 > > > > > >col1 on col2 > >rsquare 0.86823999628504 > > col1 = -32422.5616652023 + col2 * > 0.00875118243314 > > > >log(col1) on col2 > >rsquare= 0.9284715257232 > > k1 = 9.57036 + k2 * > 1.254716E-7 > > > >*--- revised problem data for comparing solution > methods -- ; > >data have; > >input col1 col2 ; > >logCol1 = log(col1) ; > >l > >datalines; > > 30000 6000000 > > 50000 10000000 > > 65000 13000000 > > 70000 13500000 > > 76000 13525000 > >113000 13925000 > >115500 15925000 > >119500 16925000 > >121500 17425000 > >122500 17725000 > >;; > >run; > > > >proc nlin data=have > > Best= > 40 > > method= > Newton hougaard; > > parms a = -10 to 10 by > 1.0 > > b = > 0 to 10 by 0.00250 > > c = > 0 to 1 by 0.025 > > ;;; > > > > lnCol1_a = log(Col1 + a > ) ; > > cCol2_b = c * > Col2 + b ; > > > > model lnCol1_a = > cCol2_b ; > >run; > > > >proc reg data=have; > >Model logCol1 = Col2; > >run; > > > > > >... Mark Miller > > > > > > > >On Thu, Nov 12, 2009 at 6:03 PM, Sid N <nsid31(a)gmail.com> > wrote: > > > >> Hi, > >> > >> I would like to find the maximum R^2 that can be > attained between > >> log(column1 + a) and (c * column2 + b) for the > below dataset by changing > >> the > >> values for a, b and c. Previously, I have used the > 'Solver' add-in in > >> Excel. > >> > >> data have; > >> input column1 column2; > >> datalines; > >> 30000 6000000 > >> 50000 10000000 > >> 65000 13000000 > >> 70000 13500000 > >> 76000 13525000 > >> 113000 13925000 > >> 115500 15925000 > >> 119500 16925000 > >> 121500 17425000 > >> 122500 17725000 > >> ; > >> run; > >> > >> Based on what I have read from the SAS-L posts, I > understand that PROC NLP > >> can help me do the same in SAS. Can someone please > assist me with using > >> this > >> procedure? Thank you for your time. > >> > >> Sid > >> >
From: Sid N on 11 Dec 2009 17:41 Hi, This is a follow-up question regarding non-linear modeling from my post last month. Although I have received some useful input at that time, I am still not able to replicate the results that I am getting with the Excel Solver. I may have left out some details in my original post (below) which may be pertinent to the question at hand. The original equation is of the form: log(column1 + A) = C * column2 + B Here are the constraints on the paramaters: A >= -(min(column1)) B = max(log(column1 + A)) - max(column2) * B C is the slope of the predicted log(column1 + A) values and the original column2 values. I would like to know how I could set the initial parameters and bounds on parameters B and C, which are dependent on other parameter(s) and variables while using PROC NLIN/PROC NLMIXED/PROC OPTMODEL? Thank you again in advance. Sid On Thu, Nov 12, 2009 at 5:03 PM, Sid N <nsid31(a)gmail.com> wrote: > Hi, > > I would like to find the maximum R^2 that can be attained between > log(column1 + a) and (c * column2 + b) for the below dataset by changing > the > values for a, b and c. Previously, I have used the 'Solver' add-in in > Excel. > > data have; > input column1 column2; > datalines; > 30000 6000000 > 50000 10000000 > 65000 13000000 > 70000 13500000 > 76000 13525000 > 113000 13925000 > 115500 15925000 > 119500 16925000 > 121500 17425000 > 122500 17725000 > ; > run; > > Based on what I have read from the SAS-L posts, I understand that PROC NLP > can help me do the same in SAS. Can someone please assist me with using > this > procedure? Thank you for your time. > > Sid >
From: Sid N on 11 Dec 2009 17:46 Please note this correction on the parameter "B" constraint: B = max(log(column1 + A)) - max(column2) * C Sid On Fri, 11 Dec 2009 16:41:57 -0600, Sid N <nsid31(a)GMAIL.COM> wrote: >Hi, > >This is a follow-up question regarding non-linear modeling from my post last >month. Although I have received some useful input at that time, I am still >not able to replicate the results that I am getting with the Excel Solver. > >I may have left out some details in my original post (below) which may be >pertinent to the question at hand. The original equation is of the form: > >log(column1 + A) = C * column2 + B > >Here are the constraints on the paramaters: > >A >= -(min(column1)) >B = max(log(column1 + A)) - max(column2) * B >C is the slope of the predicted log(column1 + A) values and the original >column2 values. > >I would like to know how I could set the initial parameters and bounds on >parameters B and C, which are dependent on other parameter(s) and variables >while using PROC NLIN/PROC NLMIXED/PROC OPTMODEL? > >Thank you again in advance. > >Sid > >On Thu, Nov 12, 2009 at 5:03 PM, Sid N <nsid31(a)gmail.com> wrote: > >> Hi, >> >> I would like to find the maximum R^2 that can be attained between >> log(column1 + a) and (c * column2 + b) for the below dataset by changing >> the >> values for a, b and c. Previously, I have used the 'Solver' add-in in >> Excel. >> >> data have; >> input column1 column2; >> datalines; >> 30000 6000000 >> 50000 10000000 >> 65000 13000000 >> 70000 13500000 >> 76000 13525000 >> 113000 13925000 >> 115500 15925000 >> 119500 16925000 >> 121500 17425000 >> 122500 17725000 >> ; >> run; >> >> Based on what I have read from the SAS-L posts, I understand that PROC NLP >> can help me do the same in SAS. Can someone please assist me with using >> this >> procedure? Thank you for your time. >> >> Sid >>
|
Next
|
Last
Pages: 1 2 Prev: 3D Scatter Plot w. 45degree plane Next: Standard N(0,1) test and PROC CAPABILITY NORMALTEST |