From: Jesper Sahner Pedersen on
Hi,

Consider the following two examples:

Example 1.
data test;
do x=0 to 100;
y=ranuni(0);
if abs(x-25)<2 | abs(x-75)<2
then y=y+5;
w=1;
if x<50
then do i=0 to 100;
output;
end;
else output;
end;
run;

Example 2.
data test;
do x=0 to 100;
y=ranuni(0);
if abs(x-25)<2 | abs(x-75)<2
then y=y+5;
if x<50
then w=100;
else w=1;
output;
end;
run;


ods graphics on;

proc loess data=test plots(maxpoints=none);
ods output OutputStatistics=test_stats FitSummary=test_summary;
model y=x / smooth = 0.06 to 0.2 by 0.01 dfmethod=exact clm alpha=0.1;
weight w;
run;

ods graphics off;

symbol1 c=blue i=join value=none;
symbol2 c=green i=join value=none;
symbol3 c=red i=join value=none;
symbol4 c=black i=join value=none;

proc gplot data=test_stats;
by SmoothingParameter;
plot (DepVar Pred UpperCL LowerCL)*x / overlay;

run;
quit;


I would expect the examples to produce the same result, however the results
are quite different.

The example 1. produces the expected result. Only the first peek at x=25
represents a significant local structure because of the large portion of
observations while the other peek at x=75 is an outlier (very few
observations). LOESS only fits the first peek as it should.

The example 2. doesn't produce the expected result. Both peeks at x=25 and
x=75 are equally fitted as if the weight-parameter isn't considered. However
the confidence limits indicates that the weight-parameter is included. What
am I missing?

Even on moderate sample-sizes LOESS will quickly run out of memory, hence
using the raw data as in example 1 is not a solution. Even in the simple
example 1. LOESS will run for quite a while before producing the result.
Therefore som pre-summarizing as in example 2 would be appropriate. However
the result is not as expected.

Comments, any ideas?

Regards,
Jesper