From: Captain Obvious on 15 Jul 2010 06:53 hi I'm experimenting with audio compression, just for fun, -- mostly trying combining different well-known things to see what works. I mostly work with transforms (DCT, DFT, KLT), but it would be interesting to experiment with linear prediction as well. So if someone has implementation of backward-adaptive linear prediction in Common Lisp that is general enough and can be plugged into a program easily I'd be grateful to get the code. Or alternatively I'd read some article which describes how to write one, so pointers are welcome, but I'm not in position to read whole textbook just to try one thing. (That's why I'm asking code.) By the way, my best result so far is something like 55 kbit/sec per channel for lossy compression at a quality when you hear almost no difference -- so it is comparable to MP3, although inferior. For lossless compression I get results comparable to FLAC. Although I need to say that all this is only an estimation -- instead of writing into bitstream program just estimates number of bits it would write to the stream. So it might be not accurate. If anybody here would like to play with audio compression too I can share pieces of code I have. It's not like I have a lot of code or I have a nice framework, but even getting a little function might save some time. For example, I spent few hours trying to find the correct formula for DCT.
From: Raymond Toy on 15 Jul 2010 09:08 On 7/15/10 6:53 AM, Captain Obvious wrote: > hi > > I'm experimenting with audio compression, just for fun, -- mostly trying > combining different well-known things to see what works. People already know what well-known things work. :-) But it sure is fun reimplementing them for your self and hearing the results! > > I mostly work with transforms (DCT, DFT, KLT), but it would be > interesting to experiment with linear prediction as well. > > So if someone has implementation of backward-adaptive linear prediction > in Common Lisp that is general enough and can be plugged into a program > easily I'd be grateful to get the code. Or alternatively I'd read some > article which describes how to write one, so pointers are welcome, but > I'm not in position to read whole textbook just to try one thing. > (That's why I'm asking code.) Sorry, I don't have any Lisp code handy. The basic parts of linear predictive codes are simple. Backward adapation is a bit tricky, and so is joining the pieces together (if you're processing the data in blocks.) > > By the way, my best result so far is something like 55 kbit/sec per > channel for lossy compression at a quality when you hear almost no > difference -- so it is comparable to MP3, although inferior. For > lossless compression I get results comparable to FLAC. > Although I need to say that all this is only an estimation -- instead of > writing into bitstream program just estimates number of bits it would > write to the stream. So it might be not accurate. This seems a little fishy. Is your codec a variable rate codec? Even though, you ought to be quantizing everything already and thus should be able to know exactly how many bits are being used. This might not include any overhead bits you would need if you actually wrote it to the stream though. > > If anybody here would like to play with audio compression too I can > share pieces of code I have. It's not like I have a lot of code or I > have a nice framework, but even getting a little function might save > some time. For example, I spent few hours trying to find the correct > formula for DCT. I used to have lots of code to do things like this, including DCTs, KLTs, Levinson-Durbin, transformations of linear prediction coefficients to other forms, and stuff like that, including a transform trellis code using a random codebook. But it's all in C. Ray
From: Captain Obvious on 15 Jul 2010 10:42 ??>> I'm experimenting with audio compression, just for fun, -- mostly ??>> trying combining different well-known things to see what works. RT> People already know what well-known things work. :-) RT> But it sure is fun reimplementing them for your self and hearing the RT> results! Well, there is a huge number of possible combinations and I'm sure not all of them were tried so far. So it is still possible to create something new and potentially superior to existing analogs without inventing a funamentally new approach. Currently I have KLT as a base and now I'm working to further decorrelate coefficients. People on comp.compression were quite surprised that someone even cosiders KLT for lossless audio compression, so it is likely to be under-researched direction. And who knows, maybe some trick will make it compressing better. ??>> So if someone has implementation of backward-adaptive linear ??>> prediction in Common Lisp that is general enough and can be plugged ??>> into a program easily I'd be grateful to get the code. Or ??>> alternatively I'd read some article which describes how to write one, ??>> so pointers are welcome, but I'm not in position to read whole ??>> textbook just to try one thing. (That's why I'm asking code.) RT> Sorry, I don't have any Lisp code handy. The basic parts of linear RT> predictive codes are simple. Well, yeah. I've just used ordinary least squares to estimate linear prediction coefficients and it works, sort of, but that is not what I need because coefficients tend to change. RT> Backward adapation is a bit tricky, and so is joining the pieces RT> together (if you're processing the data in blocks.) So I didn't even dare to implement this part :) ??>> By the way, my best result so far is something like 55 kbit/sec per ??>> channel for lossy compression at a quality when you hear almost no ??>> difference -- so it is comparable to MP3, although inferior. For ??>> lossless compression I get results comparable to FLAC. ??>> Although I need to say that all this is only an estimation -- instead ??>> of writing into bitstream program just estimates number of bits it ??>> would write to the stream. So it might be not accurate. RT> This seems a little fishy. It is :). RT> Is your codec a variable rate codec? Yep. RT> Even though, you ought to be quantizing everything already and thus RT> should be able to know exactly how many bits are being used. Quantization does not give exact results because exact result depends on entropy encoder being used. And if I just estimate entropy it depends on entropy estimator being used, which is kinda tricky too. (And it doesn't include entropy encoder's overhead.) But now I'm simulating Golomb-power-of-two as entropy encoder and so I get reasonable size estimate. But it is possible that I have a bug in my GPO2-simulator somewhere, so it isn't as accurate as having actual codec. RT> This might not include any overhead bits you would need if you RT> actually wrote it to the stream though. Yep, I do not count overhead bits. It is boring :) Also as I'm experimenting with short fragments and overhead for short fragments is higer, counting overhead won't be a good indicator on real compression rate. ??>> If anybody here would like to play with audio compression too I can ??>> share pieces of code I have. It's not like I have a lot of code or I ??>> have a nice framework, but even getting a little function might save ??>> some time. For example, I spent few hours trying to find the correct ??>> formula for DCT. RT> I used to have lots of code to do things like this, including DCTs, RT> KLTs, Levinson-Durbin, transformations of linear prediction RT> coefficients to other forms, and stuff like that, including a transform RT> trellis code using a random codebook. But it's all in C. Well, if it is readable code I'd be glad to have it as I can read C and translate it to Lisp -- perhaps that's faster than getting familiar with underlying theory. It is tricky to find high-quality code. It looks like people mostly do signal processing in Matlab, and understanding Matlab code is hard for me, especially as it is usually badly-formatted. And C code from signal processing courses often looks weird and just does not work.
From: Raymond Toy on 15 Jul 2010 15:18 On 7/15/10 10:42 AM, Captain Obvious wrote: > ??>> I'm experimenting with audio compression, just for fun, -- mostly > ??>> trying combining different well-known things to see what works. > > RT> People already know what well-known things work. :-) > > RT> But it sure is fun reimplementing them for your self and hearing the > RT> results! > > Well, there is a huge number of possible combinations and I'm sure not > all of them were tried so far. > So it is still possible to create something new and potentially superior > to existing analogs without inventing a funamentally new approach. > > Currently I have KLT as a base and now I'm working to further > decorrelate coefficients. > People on comp.compression were quite surprised that someone even > cosiders KLT for lossless audio compression, so it is likely to be > under-researched direction. And who knows, maybe some trick will make it > compressing better. I would have thought KLTs are prohibitively expensive to find and to use. No fast algorithms are available, and DCTs are provably asymptotic to KLTs for large blocks for AR signals. > Well, yeah. > I've just used ordinary least squares to estimate linear prediction > coefficients and it works, sort of, but that is not what I need because > coefficients tend to change. > > RT> Backward adapation is a bit tricky, and so is joining the pieces > RT> together (if you're processing the data in blocks.) > > So I didn't even dare to implement this part :) I think codecs apply a window to the data, do their stuff, and move to the next block, which overlaps the previous block by some amount. I forget how the outputs are joined together. > > RT> I used to have lots of code to do things like this, including DCTs, > RT> KLTs, Levinson-Durbin, transformations of linear prediction > RT> coefficients to other forms, and stuff like that, including a transform > RT> trellis code using a random codebook. But it's all in C. > > Well, if it is readable code I'd be glad to have it as I can read C and > translate it to Lisp -- perhaps that's faster than getting familiar with > underlying theory. I think it's still fairly readable after several decades. But I'll have to dig around for the source code. I had printouts until recently, but I'm not sure I have the code anymore. I'll get back to you if I find them. It would be kind of fun (but useless) to revive my transform trellis codec. Ray
From: Captain Obvious on 15 Jul 2010 15:58
RT> I would have thought KLTs are prohibitively expensive to find and to RT> use. That's why it is rarely used and that's why it is probably under-researched and that's why it is interesting to work with it :) RT> No fast algorithms are available, Since I'm doing it just for fun, I'm not constrained with performance considerations. But I'd say it is actually pretty fast -- library I'm using can do SVD of 17 seconds worth of a sound in 2 seconds on 2 GHz CPU. Faster than realtime. And decoding is just matrix-vector multiplication. Isn't that slow. RT> and DCTs are provably asymptotic to KLTs for large blocks for AR RT> signals. No, I don't think so. DCT is provable asymtotic to KLT when you're trying to encode arbitrary signal. If you build transform for a given signal (which is the point), it will be optimal for that signal only, as it takes into account spectral content of the signal -- if some frequencies aren't there KLT will not waste bits encoding them. Corner case is single sinusoid -- KLT only needs two non-zero components to encode it, no matter what frequency is. While DCT will have spectral leakages on all bands for frequencies which aren't block size divisible. Since music is usually not infinitely complex, KLT should do better than DCT. Practical results I have: 256-sample blocks, 17 seconds of signal sampled at 44100, the signal being loud pop music. KLT has only 210 non-zero components, with 25 of them being very close to zero, so actually it is like 185 non-zero components. 70 components are just not used. It is not the case with DCT -- it uses all coefficients, and only maybe 10-20 last (high-frequency) coefficients are close to zero. RT> I'll get back to you if I find them. Ok, thanks. |