Is Regression Using Proc IML Faster? [SAS]

Prev: Partial Nesting in Proc Mixed
Next: question asked in an Interview

From: oloolo on 25 Jan 2010 11:44

why re-invent the wheel ?

modern commercial analytical softwares from big name houses are well tested
before shipping out, and I bet a non-professional coder won't get a
better/more robust code than commercially available ones.

On Sat, 23 Jan 2010 18:07:59 -0800, Arthur Tabachneck <art297(a)NETSCAPE.NET>
wrote:

>Tanwan,
>
>Your comment reminds me of my first job as an analyst where I worked
>with two systems, Statistical Analysis System (on an IBM 360) and a
>memory-card-based HP electronic calculator.
>
>All of the correlational analyses, for whatever reason, were always
>done with the calculator's stock program. I found it surprising that
>all of the relationships discovered were positive ones. Of course, as
>it turned out, they weren't. The software makers had made a mistake
>in their programming.
>
>What I learned from that experience was to NEVER assume that ANY
>software maker was infallible.
>
>Besides, in the OPs case, what a perfect way to learn IML, learn more
>about regression, and simultaneously be able to compare their results
>with SAS output and discover which is correct or incorrect and why.
>
>Art
>----------
>On Jan 23, 7:55 pm, tanwan <tanwanz...(a)yahoo.com> wrote:
>> GLM and Logistic are well documented, tested, and they do what they
>> are supposed and intended to do. What are the odds that you will make
>> a coding error with IML that you wont even notice, trying to re-invent
>> a wheel?
>>
>> Besides, how much time are you going to save? A few seconds? A few
>> minutes? I think on average one spends 95% of the time coding code,
>> and then 5% running the code. You can use those few extra moments for
>> a trip to the water cooler, call someone significant, or just browse
>> the latest online news.
>>
>> T

From: Arthur Tabachneck on 25 Jan 2010 14:06

Oloolo,

I wish I had known that before spending all that money towards the end of
1999.

Seriously, my strongest argument for looking under the hood is the
learning related one and I'd NEVER discourage anyone from doing it.

Art
-------
On Mon, 25 Jan 2010 11:44:01 -0500, oloolo <dynamicpanel(a)YAHOO.COM> wrote:

>why re-invent the wheel ?
>
>modern commercial analytical softwares from big name houses are well
tested
>before shipping out, and I bet a non-professional coder won't get a
>better/more robust code than commercially available ones.
>
>On Sat, 23 Jan 2010 18:07:59 -0800, Arthur Tabachneck
<art297(a)NETSCAPE.NET>
>wrote:
>
>>Tanwan,
>>
>>Your comment reminds me of my first job as an analyst where I worked
>>with two systems, Statistical Analysis System (on an IBM 360) and a
>>memory-card-based HP electronic calculator.
>>
>>All of the correlational analyses, for whatever reason, were always
>>done with the calculator's stock program. I found it surprising that
>>all of the relationships discovered were positive ones. Of course, as
>>it turned out, they weren't. The software makers had made a mistake
>>in their programming.
>>
>>What I learned from that experience was to NEVER assume that ANY
>>software maker was infallible.
>>
>>Besides, in the OPs case, what a perfect way to learn IML, learn more
>>about regression, and simultaneously be able to compare their results
>>with SAS output and discover which is correct or incorrect and why.
>>
>>Art
>>----------
>>On Jan 23, 7:55 pm, tanwan <tanwanz...(a)yahoo.com> wrote:
>>> GLM and Logistic are well documented, tested, and they do what they
>>> are supposed and intended to do. What are the odds that you will make
>>> a coding error with IML that you wont even notice, trying to re-invent
>>> a wheel?
>>>
>>> Besides, how much time are you going to save? A few seconds? A few
>>> minutes? I think on average one spends 95% of the time coding code,
>>> and then 5% running the code. You can use those few extra moments for
>>> a trip to the water cooler, call someone significant, or just browse
>>> the latest online news.
>>>
>>> T

From: oloolo on 25 Jan 2010 16:48

I agree. My point is if OP's problem is analyzing the data, then don't
waste time to re-invent the tool. On the other hand, if OP has some spare
time, he is encouraged to dig deeper on the apparently 'black box' of PROCs
to better understand what goes on behind that several lines of code...

On Mon, 25 Jan 2010 14:06:47 -0500, Arthur Tabachneck <art297(a)NETSCAPE.NET>
wrote:

>Oloolo,
>
>I wish I had known that before spending all that money towards the end of
>1999.
>
>Seriously, my strongest argument for looking under the hood is the
>learning related one and I'd NEVER discourage anyone from doing it.
>
>Art
>-------
>On Mon, 25 Jan 2010 11:44:01 -0500, oloolo <dynamicpanel(a)YAHOO.COM> wrote:
>
>>why re-invent the wheel ?
>>
>>modern commercial analytical softwares from big name houses are well
>tested
>>before shipping out, and I bet a non-professional coder won't get a
>>better/more robust code than commercially available ones.
>>
>>On Sat, 23 Jan 2010 18:07:59 -0800, Arthur Tabachneck
><art297(a)NETSCAPE.NET>
>>wrote:
>>
>>>Tanwan,
>>>
>>>Your comment reminds me of my first job as an analyst where I worked
>>>with two systems, Statistical Analysis System (on an IBM 360) and a
>>>memory-card-based HP electronic calculator.
>>>
>>>All of the correlational analyses, for whatever reason, were always
>>>done with the calculator's stock program. I found it surprising that
>>>all of the relationships discovered were positive ones. Of course, as
>>>it turned out, they weren't. The software makers had made a mistake
>>>in their programming.
>>>
>>>What I learned from that experience was to NEVER assume that ANY
>>>software maker was infallible.
>>>
>>>Besides, in the OPs case, what a perfect way to learn IML, learn more
>>>about regression, and simultaneously be able to compare their results
>>>with SAS output and discover which is correct or incorrect and why.
>>>
>>>Art
>>>----------
>>>On Jan 23, 7:55 pm, tanwan <tanwanz...(a)yahoo.com> wrote:
>>>> GLM and Logistic are well documented, tested, and they do what they
>>>> are supposed and intended to do. What are the odds that you will make
>>>> a coding error with IML that you wont even notice, trying to re-invent
>>>> a wheel?
>>>>
>>>> Besides, how much time are you going to save? A few seconds? A few
>>>> minutes? I think on average one spends 95% of the time coding code,
>>>> and then 5% running the code. You can use those few extra moments for
>>>> a trip to the water cooler, call someone significant, or just browse
>>>> the latest online news.
>>>>
>>>> T

From: wolfgang on 26 Jan 2010 03:42

Hi,
I cannot believe that you could be faster using IML code than PROC
REG.
As was already said here, the main problem with IML is that you must
load the matrix into core.
[1] Now, PROC REG computes the X'X matrix by running observationswise
through the data set
and X must never be in code. That means PROC REG accumulates a lot of
error (the condition
of X'X is squared the condition of X) by computing X'X, but it should
be faster than IML when doing
the same in two steps, loading the matrix ad computing X'X.
[2] After X'X is computed, PROC REG is using sweep, which is of course
much slower than Cholesky.
Here, IML could be faster when Cholesky is used. However, using
Cholesky on a ill conditiones X'X
may get you in trouble.
[3] So, if X is ill conditioned, I would use either PROC ORTGOREG or
IML.
But if the condition of X is okay, I would use PROC REG.

When computing benchmarks, the condition of X should be one factor.
Another factor would be the form of the matrix, i.e. is Nobs >> nvar
or is Nobs not
much larger than nvar. These are two very different situations. Also
running very many
small problems or just a few very large problems should be a factor
etc.

Wolfgang

From: wolfgang on 26 Jan 2010 03:44

On Jan 26, 9:42 am, wolfgang <cmat.wolfg...(a)gmail.com> wrote:
> Hi,
> I cannot believe that you could be faster using IML code than PROC
> REG.
> As was already said here, the main problem with IML is that you must
> load the matrix into core.
> [1] Now, PROC REG computes the X'X matrix by running observationswise
> through the data set
> and X must never be in code. That means PROC REG accumulates a lot of
> error (the condition
> of X'X is squared the condition of X) by computing X'X, but it should
> be faster than IML when doing
> the same in two steps, loading the matrix ad computing X'X.
> [2] After X'X is computed, PROC REG is using sweep, which is of course
> much slower than Cholesky.
> Here, IML could be faster when Cholesky is used. However, using
> Cholesky on a ill conditiones X'X
> may get you in trouble.
> [3] So, if X is ill conditioned, I would use either PROC ORTGOREG or
> IML.
> But if the condition of X is okay, I would use PROC REG.
>
> When computing benchmarks, the condition of X should be one factor.
> Another factor would be the form of the matrix, i.e. is Nobs >> nvar
> or is Nobs not
> much larger than nvar. These are two very different situations. Also
> running very many
> small problems or just a few very large problems should be a factor
> etc.
>
> Wolfgang

of cousrse I meant "in core" and not "in code"

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: Partial Nesting in Proc Mixed
Next: question asked in an Interview