From: Joe on
On Wed, 28 Apr 2010 17:03:06 -0700, Jim Thompson wrote:

> I have a string of data (from an IBIS file), voltage-versus-time, a
> 100ns string, at 100ps intervals... 1000 data points :-(
>
> I'd like to make this into a PWL source, but reduce data points based on
> areas that stay at essentially a constant slope for awhile.
>
> I can do this by hand, _tediously_ :-(
>
> Anyone know of some kind of algorithm that automates, or at least eases
> the pain?

You could try the perl program that appears below, between the two
lines of dashes. Change the variable $ELF from 0.01 (that is, 1%)
to whatever level of accuracy you want. The main idea of the program
is to find end-points of intervals in the original dataset such that
points between the end-points are within the specified limit of
proportional error, and print out those end-points. The algorithm is
not particularly efficient and does not necessarily produce a "best"
reduction. [Although it's easy to find a decent approximation, it's
hard (probably NP-complete) to find a best reduction, for most values
of best.] This algorithm's worst-case time is proportional to the
square of the number of points, but for datasets like in the
following example it runs in linear time.

Eg: Re <www.omega.com/temperature/z/pdf/z256-257.pdf> datasets,
the program takes about 15 milliseconds to process 231 numbers.
For data from the first column, it outputs a data set of 65 points
when ELF=0.005; 45 points when ELF=0.01; and 31 points when ELF=0.02.

If you have a Windows system and no perl, see for example
<http://perl.about.com/od/gettingstartedwithperl/a/testperl_2.htm>,
<http://www.ricocheting.com/how-to-install-on-windows/perl>,
and <http://www.perl.com/download.csp>. The 'Usage' comment
in the program is how the program can be run on a linux system.

----------------------------------------
#!/usr/bin/perl

# Program to winnow data by removing points that fall within specified
# fraction of linear-interpolated values between remaining points.
# (For small values, instead test absolute error; see comments.)

# Usage: ./data-winnow < datasetin > datasetout

# Each line of input should begin with 2 decimal numbers, separated by
# one or more blanks. Leading blanks, signs, decimal points are ok.

# Method: Test if points between range-start and last-point-read are
# in limits. If not, output range-start and start new range.

$ELF = 0.01; # error-limit fraction ( 0.01 = 1% )
$AEL = 1e-7; # absolute-error-limit, used when abs(y) < eps
$eps = 1e-7; # If abs(y) < eps, test abs(y - interpolated y) < AEL
$L = 0; # L = input line number
$c = -1; # c = conversion state
$p = 0; # p = range start point

foreach $s (<>) { # s = string of input
++$L;
@nums = split (/ +/, $s);
$i = $nums[0]?0:1; # Skip leading blanks if any
$x[$L] = 1.0 * $nums[$i]; # Convert string to number
$y[$L] = 1.0 * $nums[$i+1];

if ($c > 0) {
if (testRange()) {
print "$x[$p] $y[$p]\n";
$c = 0;
}
}
if (!$c) {
$p = $L-1;
}
++$c;
}
print "$x[$L] $y[$L]\n";

sub testRange {
local ($yp=$y[$p], $yl=$y[$L], $a, $i, $u);
local ($xp=$x[$p], $xl=$x[$L], $d=$xl-$xp);
die "Duplicated x at lines $p and $L ?" if (abs($d) < 1e-10);

for ($i=$p+1; $i < $L; ++$i) {
$a = ($x[$i]-$xp)/$d;
$u = $y[$i];
$v = $a*$yl + (1-$a)*$yp;
if (abs($u) < $eps) {
if (abs($u-$v) > $AEL) { return 1; }
} else {
if (abs(($u-$v)/$u) > $ELF) { return 1; }
}
}
return 0;
} # By joe(a)swo-za, 4 May 2010
-------------------------------------------------------------
db340c6484a1cf0d46cdc2d6ce6a9815c3fa8e1e Mcode