From: Luna Moon on
Hi all,

I have a vector of real numbers in Matlab. How do I compress them? Of
course this has to be lossless, since I need to be able to recover
them.

The goal is to study the Shannon rate and entropy of these real
numbers, so I decide to compress them and see how much compression
ratio I can have.

I don't need to write the result into compressed files, so those
headers, etc. are just overhead for me which affect me calculating the
Entropy... so I just need a bare version of the compress ratio...

Any pointers?

Thanks a lot!
From: John on
On Apr 2, 3:50 pm, Luna Moon <lunamoonm...(a)gmail.com> wrote:
> Hi all,
>
> I have a vector of real numbers in Matlab. How do I compress them?  Of
> course this has to be lossless, since I need to be able to recover
> them.
>
> The goal is to study the Shannon rate and entropy of these real
> numbers, so I decide to compress them and see how much compression
> ratio I can have.
>
> I don't need to write the result into compressed files, so those
> headers, etc. are just overhead for me which affect me calculating the
> Entropy... so I just need a bare version of the compress ratio...
>
> Any pointers?
>
> Thanks a lot!

Consider the array of numbers in binary form. Rearrange the bits so
all the ones are sequential, and do the same for the zeros. The number
of ones followed by the number of zeros is your compressed file.

John
From: Roger Stafford on
Luna Moon <lunamoonmoon(a)gmail.com> wrote in message <205a603e-cc38-4088-8d39-5d5b8464abf7(a)d34g2000vbl.googlegroups.com>...
> Hi all,
>
> I have a vector of real numbers in Matlab. How do I compress them? Of
> course this has to be lossless, since I need to be able to recover
> them.
>
> The goal is to study the Shannon rate and entropy of these real
> numbers, so I decide to compress them and see how much compression
> ratio I can have.
>
> I don't need to write the result into compressed files, so those
> headers, etc. are just overhead for me which affect me calculating the
> Entropy... so I just need a bare version of the compress ratio...
>
> Any pointers?
>
> Thanks a lot!

Unless your vector has many repetitions or consists of quantities with many trailing zeros in their binary floating point form, (or is of astronomically large size,) I would not expect lossless compression to have much success. Usually the 53-bit significands of a collection of non-integer floating point numbers are mostly different and the only area where compression is likely to succeed lies in their 11 bits of exponent which tend to be concentrated in a limited area of the 2048 possibilities.

Roger Stafford
From: robert bristow-johnson on
On Apr 2, 3:50 pm, Luna Moon <lunamoonm...(a)gmail.com> wrote:
> Hi all,
>
> I have a vector of real numbers in Matlab. How do I compress them?  Of
> course this has to be lossless, since I need to be able to recover
> them.
>
> The goal is to study the Shannon rate and entropy of these real
> numbers, so I decide to compress them and see how much compression
> ratio I can have.
>
> I don't need to write the result into compressed files, so those
> headers, etc. are just overhead for me which affect me calculating the
> Entropy... so I just need a bare version of the compress ratio...
>
> Any pointers?
>

do you know about Huffman coding? it's in Wikipedia.

if the floating-point numbers are sorta random, not derived from a
"normal-looking" signal, there is not much you can do to compress. if
the range of the numbers are limited (at least probabilistically) then
Huffman coding might help a little. but i tend to think that the it
would be only the exponent bits that would be compressible and there
is not much to gain, since the exponent bits are a small portion of
the floating-point word. the mantissa bits will look pretty random,
and there is not much a lossless scheme can do about that.

if the signal is reasonably bandlimited, you can use LPC, predict the
next samples (from the previous N samples), and encode the
*difference* between the predicted value and what you really have. if
the prediction is good, the difference should be small and the number
of bits needed to represent it should be small (and you might Huffman
code those).

i know for audio, lossless compression doesn't gain a lot of saving of
space. it might save maybe 50%.


> Thanks a lot!

FWIW,

r b-j
From: Tim Wescott on
Luna Moon wrote:
> Hi all,
>
> I have a vector of real numbers in Matlab. How do I compress them? Of
> course this has to be lossless, since I need to be able to recover
> them.
>
> The goal is to study the Shannon rate and entropy of these real
> numbers, so I decide to compress them and see how much compression
> ratio I can have.
>
> I don't need to write the result into compressed files, so those
> headers, etc. are just overhead for me which affect me calculating the
> Entropy... so I just need a bare version of the compress ratio...
>
> Any pointers?

Find another approach to getting an answer, maybe.

First, most lossless compression algorithms are designed for things like
text, executables, and data bases -- they don't do well with floating
point numbers, tending to see them as "random" even when they're not.

Second, if you measure a bunch of meaningless white noise and put the
result into floating point numbers, then put them into a lossless
algorithm that _can_ handle floating point, it's not going to compress
at all, because the algorithm can't distinguish between white noise and
a signal that's chock-full of information. In effect you'll have
_given_ it a signal full of information, in great detail, about the noise.

I think you're leading yourself down the garden path.

--
Tim Wescott
Control system and signal processing consulting
www.wescottdesign.com