From: John D'Errico on
"Doug Weathers" <dougw(a)spamcop.net> wrote in message <i00nkg$2dh$1(a)fred.mathworks.com>...
> "John D'Errico" <woodchips(a)rochester.rr.com> wrote in message
>
> > When you don't know the break points, this is what
> > is often called a free knot spline. The knots or breaks
> > must be estimated.
>
> Great, I now have search keywords. Thanks!
>
> > My own SLM tools will find the breaks as well
> > as fit the spline for you. I do require the presence of
> > the optimization toolbox though.
>
> Very nice work.
>
> Sadly, I've been playing with SLM and haven't been able to get it to do what I want.
>
> The strength of SLM appears to be in fitting a curve to a particular data set. It allows you to play around with the data set - setting up prescriptions and knot points, plotting and tweaking, etc. until you have a very good fit.
>
> However, I'm looking for a tool that can fit a curve to an arbitrary data set, without a lot of tweaking. I have a lot of data sets to analyze, and they are piecewise linear functions without a lot of noise. The problem is that I can't predict where the internal knots will be.
>
> If you're so inclined, you can download a sampe dataset here:
>
> <http://dl.dropbox.com/u/4238208/ilh-load.mat>
>
> and have a look at it.
>
> My best effort so far is this:
>
> load ilh-load;
> slm = slmengine(x,y,'interiorknots','free', 'degree','linear','plot','on');

That will work, ALMOST.

The problem is, if you don't tell it how many knots to
use, it uses 6 of them. And you have far more than
that number of breaks in the slope.

I would simply set up a loop, looking for the slope
discontinuities. Use this scheme to look for the knots,
counting how many you will need, as well as the
number, and even good starting values for the
placement thereof. Then let a tool do its work that
can find where to place those knots. The biggest
problem is, you appear to have some non-normal
noise in the data. That junk in your data will cause
problems.

John
From: Doug Weathers on
"Bruno Luong" <b.luong(a)fogale.findmycountry> wrote in message <i01j5t$aut$1(a)fred.mathworks.com>...
> This is what I got http://yfrog.com/5hxjqp

That looks really good!

>
> The only tweaking is increase the number of initial knots to 40 since there are many periods. Command used:
>
> pp = BSFK(x,y,2,40,[],struct('Animation',1))

Sadly, I cannot get BSFK to run on my old copy of MATLAB. I get many error messages.

However, your earlier hint about the Ramer-Douglas-Peucker algorithm has borne fruit. I also cannot use the DPSIMPLIFY package in the file exchange on my copy of MATLAB, but the algorithm is so simple I just wrote my own from the pseudocode in the Wikipedia article.

Here it is: <http://dl.dropbox.com/u/4238208/dpsimp.m>
Here's some sample code: <http://dl.dropbox.com/u/4238208/linetest.m>

Thanks for the tip. That is a beautiful algorithm. I love recursive code and it makes me happy when I get to use it.

One drawback: this algorithm only uses existing points. It can't find the *best possible* fit to the data, only one that's *good enough*, and only one that goes through actual data points (which are noisy). It's probably good enough for my purposes, though. Thanks again!

It might be possible to use the R-D-P algorithm to find the knots, then feed that info to the SLM package (which *does* run on my MATLAB). I'll play with that soon.
From: Doug Weathers on
"John D'Errico" <woodchips(a)rochester.rr.com> wrote in message <i01uqs$93e$1(a)fred.mathworks.com>...
> > My best effort so far is this:
> >
> > load ilh-load;
> > slm = slmengine(x,y,'interiorknots','free', 'degree','linear','plot','on');
>
> That will work, ALMOST.
>
> The problem is, if you don't tell it how many knots to
> use, it uses 6 of them. And you have far more than
> that number of breaks in the slope.

I read the documentation as saying that you can choose free knots, OR you can specify the number of (evenly-spaced) knots, OR you can specify the knots exactly. Can you use both 'interiorknots','free' and 'knots',40 ? (I'll give that a try tomorrow.)

>
> I would simply set up a loop, looking for the slope
> discontinuities. Use this scheme to look for the knots,
> counting how many you will need, as well as the
> number, and even good starting values for the
> placement thereof. Then let a tool do its work that
> can find where to place those knots.

That was my plan, if I failed to find a package that did all the work for me. I think I can use the Radel-Douglas-Peucker algorithm to find the knots, and SLM to fit the curves.

> The biggest
> problem is, you appear to have some non-normal
> noise in the data. That junk in your data will cause
> problems.

Yes, it does. The R-D-P algorithm can't work around it with any values of epsilon I could find. It looks like the data can be cleaned up with

y = smooth(y,0.002,'lowess');

It also rounds the corners off a little bit, but it shouldn't move the x-values of the knots very much.
From: Doug Weathers on
To close the thread:

I got very nice results with the following procedure:

1) smooth the data with the SMOOTH command (to get rid of transients)
2) use the Douglas-Peucker algorithm to approximate the curve as a series of line segments
3) feed the x-values of the points from 2) into SLMENGINE as the initial guesses for the interior knots and let it find the optimal location for them

Thanks all for your help!