sound compression / backward-adaptive linear prediction [Lisp]

Prev: Please, stop spamming
Next: Talking to Mindstorms over USB

From: Captain Obvious on 15 Jul 2010 06:53

hi

I'm experimenting with audio compression, just for fun, -- mostly trying
combining different well-known things to see what works.

I mostly work with transforms (DCT, DFT, KLT), but it would be interesting
to experiment with linear prediction as well.

So if someone has implementation of backward-adaptive linear prediction in
Common Lisp that is general enough and can be plugged into a program easily
I'd be grateful to get the code. Or alternatively I'd read some article
which describes how to write one, so pointers are welcome, but I'm not in
position to read whole textbook just to try one thing. (That's why I'm
asking code.)

By the way, my best result so far is something like 55 kbit/sec per channel
for lossy compression at a quality when you hear almost no difference -- so
it is comparable to MP3, although inferior. For lossless compression I get
results comparable to FLAC.
Although I need to say that all this is only an estimation -- instead of
writing into bitstream program just estimates number of bits it would write
to the stream. So it might be not accurate.

If anybody here would like to play with audio compression too I can share
pieces of code I have. It's not like I have a lot of code or I have a nice
framework, but even getting a little function might save some time. For
example, I spent few hours trying to find the correct formula for DCT.

From: Raymond Toy on 15 Jul 2010 09:08

On 7/15/10 6:53 AM, Captain Obvious wrote:
> hi
>
> I'm experimenting with audio compression, just for fun, -- mostly trying
> combining different well-known things to see what works.

People already know what well-known things work. :-)

But it sure is fun reimplementing them for your self and hearing the
results!

>
> I mostly work with transforms (DCT, DFT, KLT), but it would be
> interesting to experiment with linear prediction as well.
>
> So if someone has implementation of backward-adaptive linear prediction
> in Common Lisp that is general enough and can be plugged into a program
> easily I'd be grateful to get the code. Or alternatively I'd read some
> article which describes how to write one, so pointers are welcome, but
> I'm not in position to read whole textbook just to try one thing.
> (That's why I'm asking code.)

Sorry, I don't have any Lisp code handy. The basic parts of linear
predictive codes are simple. Backward adapation is a bit tricky, and so
is joining the pieces together (if you're processing the data in blocks.)
>
> By the way, my best result so far is something like 55 kbit/sec per
> channel for lossy compression at a quality when you hear almost no
> difference -- so it is comparable to MP3, although inferior. For
> lossless compression I get results comparable to FLAC.
> Although I need to say that all this is only an estimation -- instead of
> writing into bitstream program just estimates number of bits it would
> write to the stream. So it might be not accurate.

This seems a little fishy. Is your codec a variable rate codec? Even
though, you ought to be quantizing everything already and thus should be
able to know exactly how many bits are being used. This might not
include any overhead bits you would need if you actually wrote it to the
stream though.

>
> If anybody here would like to play with audio compression too I can
> share pieces of code I have. It's not like I have a lot of code or I
> have a nice framework, but even getting a little function might save
> some time. For example, I spent few hours trying to find the correct
> formula for DCT.

I used to have lots of code to do things like this, including DCTs,
KLTs, Levinson-Durbin, transformations of linear prediction coefficients
to other forms, and stuff like that, including a transform trellis code
using a random codebook. But it's all in C.

Ray

From: Captain Obvious on 15 Jul 2010 10:42

??>> I'm experimenting with audio compression, just for fun, -- mostly
??>> trying combining different well-known things to see what works.

RT> People already know what well-known things work. :-)

RT> But it sure is fun reimplementing them for your self and hearing the
RT> results!

Well, there is a huge number of possible combinations and I'm sure not all
of them were tried so far.
So it is still possible to create something new and potentially superior to
existing analogs without inventing a funamentally new approach.

Currently I have KLT as a base and now I'm working to further decorrelate
coefficients.
People on comp.compression were quite surprised that someone even cosiders
KLT for lossless audio compression, so it is likely to be under-researched
direction. And who knows, maybe some trick will make it compressing better.

??>> So if someone has implementation of backward-adaptive linear
??>> prediction in Common Lisp that is general enough and can be plugged
??>> into a program easily I'd be grateful to get the code. Or
??>> alternatively I'd read some article which describes how to write one,
??>> so pointers are welcome, but I'm not in position to read whole
??>> textbook just to try one thing. (That's why I'm asking code.)

RT> Sorry, I don't have any Lisp code handy. The basic parts of linear
RT> predictive codes are simple.

Well, yeah.
I've just used ordinary least squares to estimate linear prediction
coefficients and it works, sort of, but that is not what I need because
coefficients tend to change.

RT> Backward adapation is a bit tricky, and so is joining the pieces
RT> together (if you're processing the data in blocks.)

So I didn't even dare to implement this part :)

??>> By the way, my best result so far is something like 55 kbit/sec per
??>> channel for lossy compression at a quality when you hear almost no
??>> difference -- so it is comparable to MP3, although inferior. For
??>> lossless compression I get results comparable to FLAC.
??>> Although I need to say that all this is only an estimation -- instead
??>> of writing into bitstream program just estimates number of bits it
??>> would write to the stream. So it might be not accurate.

RT> This seems a little fishy.

It is :).

RT> Is your codec a variable rate codec?

Yep.

RT> Even though, you ought to be quantizing everything already and thus
RT> should be able to know exactly how many bits are being used.

Quantization does not give exact results because exact result depends on
entropy encoder being used.
And if I just estimate entropy it depends on entropy estimator being used,
which is kinda tricky too.
(And it doesn't include entropy encoder's overhead.)

But now I'm simulating Golomb-power-of-two as entropy encoder and so I get
reasonable size estimate.
But it is possible that I have a bug in my GPO2-simulator somewhere, so it
isn't as accurate as having actual codec.

RT> This might not include any overhead bits you would need if you
RT> actually wrote it to the stream though.

Yep, I do not count overhead bits. It is boring :)
Also as I'm experimenting with short fragments and overhead for short
fragments is higer, counting overhead won't be a good indicator on real
compression rate.

??>> If anybody here would like to play with audio compression too I can
??>> share pieces of code I have. It's not like I have a lot of code or I
??>> have a nice framework, but even getting a little function might save
??>> some time. For example, I spent few hours trying to find the correct
??>> formula for DCT.

RT> I used to have lots of code to do things like this, including DCTs,
RT> KLTs, Levinson-Durbin, transformations of linear prediction
RT> coefficients to other forms, and stuff like that, including a transform
RT> trellis code using a random codebook. But it's all in C.

Well, if it is readable code I'd be glad to have it as I can read C and
translate it to Lisp -- perhaps that's faster than getting familiar with
underlying theory.

It is tricky to find high-quality code. It looks like people mostly do
signal processing in Matlab, and understanding Matlab code is hard for me,
especially as it is usually badly-formatted. And C code from signal
processing courses often looks weird and just does not work.

From: Raymond Toy on 15 Jul 2010 15:18

On 7/15/10 10:42 AM, Captain Obvious wrote:
> ??>> I'm experimenting with audio compression, just for fun, -- mostly
> ??>> trying combining different well-known things to see what works.
>
> RT> People already know what well-known things work. :-)
>
> RT> But it sure is fun reimplementing them for your self and hearing the
> RT> results!
>
> Well, there is a huge number of possible combinations and I'm sure not
> all of them were tried so far.
> So it is still possible to create something new and potentially superior
> to existing analogs without inventing a funamentally new approach.
>
> Currently I have KLT as a base and now I'm working to further
> decorrelate coefficients.
> People on comp.compression were quite surprised that someone even
> cosiders KLT for lossless audio compression, so it is likely to be
> under-researched direction. And who knows, maybe some trick will make it
> compressing better.

I would have thought KLTs are prohibitively expensive to find and to
use. No fast algorithms are available, and DCTs are provably asymptotic
to KLTs for large blocks for AR signals.

> Well, yeah.
> I've just used ordinary least squares to estimate linear prediction
> coefficients and it works, sort of, but that is not what I need because
> coefficients tend to change.
>
> RT> Backward adapation is a bit tricky, and so is joining the pieces
> RT> together (if you're processing the data in blocks.)
>
> So I didn't even dare to implement this part :)

I think codecs apply a window to the data, do their stuff, and move to
the next block, which overlaps the previous block by some amount. I
forget how the outputs are joined together.

>
> RT> I used to have lots of code to do things like this, including DCTs,
> RT> KLTs, Levinson-Durbin, transformations of linear prediction
> RT> coefficients to other forms, and stuff like that, including a transform
> RT> trellis code using a random codebook. But it's all in C.
>
> Well, if it is readable code I'd be glad to have it as I can read C and
> translate it to Lisp -- perhaps that's faster than getting familiar with
> underlying theory.

I think it's still fairly readable after several decades. But I'll have
to dig around for the source code. I had printouts until recently, but
I'm not sure I have the code anymore.

I'll get back to you if I find them. It would be kind of fun (but
useless) to revive my transform trellis codec.

Ray

From: Captain Obvious on 15 Jul 2010 15:58

RT> I would have thought KLTs are prohibitively expensive to find and to
RT> use.

That's why it is rarely used and that's why it is probably under-researched
and that's why it is interesting to work with it :)

RT> No fast algorithms are available,

Since I'm doing it just for fun, I'm not constrained with performance
considerations.
But I'd say it is actually pretty fast -- library I'm using can do SVD of 17
seconds worth of a sound in 2 seconds on 2 GHz CPU.
Faster than realtime.

And decoding is just matrix-vector multiplication. Isn't that slow.

RT> and DCTs are provably asymptotic to KLTs for large blocks for AR
RT> signals.

No, I don't think so. DCT is provable asymtotic to KLT when you're trying to
encode arbitrary signal.
If you build transform for a given signal (which is the point), it will be
optimal for that signal only, as it takes into account spectral content of
the signal -- if some frequencies aren't there KLT will not waste bits
encoding them.

Corner case is single sinusoid -- KLT only needs two non-zero components to
encode it, no matter what frequency is.
While DCT will have spectral leakages on all bands for frequencies which
aren't block size divisible.

Since music is usually not infinitely complex, KLT should do better than
DCT.

Practical results I have: 256-sample blocks, 17 seconds of signal sampled at
44100, the signal being loud pop music.
KLT has only 210 non-zero components, with 25 of them being very close to
zero, so actually it is like 185 non-zero components.
70 components are just not used.

It is not the case with DCT -- it uses all coefficients, and only maybe
10-20 last (high-frequency) coefficients are close to zero.

RT> I'll get back to you if I find them.

Ok, thanks.

| Next | Last
Pages: 1 2
Prev: Please, stop spamming
Next: Talking to Mindstorms over USB