How do I compress an array of floating numbers in Matlab? [Matlab]

Prev: assign sphere output to a field in a struct
Next: GUI-update curve everytime

From: Mark Shore on 3 Apr 2010 20:29

Luna Moon <lunamoonmoon(a)gmail.com> wrote in message <205a603e-cc38-4088-8d39-5d5b8464abf7(a)d34g2000vbl.googlegroups.com>...
> Hi all,
>
> I have a vector of real numbers in Matlab. How do I compress them? Of
> course this has to be lossless, since I need to be able to recover
> them.
>
> The goal is to study the Shannon rate and entropy of these real
> numbers, so I decide to compress them and see how much compression
> ratio I can have.
>
> I don't need to write the result into compressed files, so those
> headers, etc. are just overhead for me which affect me calculating the
> Entropy... so I just need a bare version of the compress ratio...
>
> Any pointers?
>
> Thanks a lot!

An exceeding simple test involving little or no effort on your part would be to take representative binary files and compress them with off-the-shelf utilities such as WinZip or 7-Zip.

This would certainly give you some idea of what level of lossless compression you can expect from reasonably well-tested and mature algorithms before you try to adapt your own.

From: Luna Moon on 4 Apr 2010 11:27

On Apr 3, 8:29 pm, "Mark Shore" <msh...(a)magmageosciences.ca> wrote:
> Luna Moon <lunamoonm...(a)gmail.com> wrote in message <205a603e-cc38-4088-8d39-5d5b8464a...(a)d34g2000vbl.googlegroups.com>...
> > Hi all,
>
> > I have a vector of real numbers in Matlab. How do I compress them? Of
> > course this has to be lossless, since I need to be able to recover
> > them.
>
> > The goal is to study the Shannon rate and entropy of these real
> > numbers, so I decide to compress them and see how much compression
> > ratio I can have.
>
> > I don't need to write the result into compressed files, so those
> > headers, etc. are just overhead for me which affect me calculating the
> > Entropy... so I just need a bare version of the compress ratio...
>
> > Any pointers?
>
> > Thanks a lot!
>
> An exceeding simple test involving little or no effort on your part would be to take representative binary files and compress them with off-the-shelf utilities such as WinZip or 7-Zip.
>
> This would certainly give you some idea of what level of lossless compression you can expect from reasonably well-tested and mature algorithms before you try to adapt your own.

Thanks a lot folks.

Please remember the goal is not to compress the floating numbers per
se. It's actually to measure the entropy of the data.

I don't really care how much compression it can maximally achieve.

Using WinZip is a great idea, however, I am looking for

(1) a command inside Matlab;
(2) a bare-bone compression, without the header info, etc. in Winzip,
because those are overheads in terms of measuring entropy...

Any more thoughts?

Thank you!

From: glen herrmannsfeldt on 4 Apr 2010 13:58

In comp.dsp Luna Moon <lunamoonmoon(a)gmail.com> wrote:
(snip)

> Please remember the goal is not to compress the floating numbers per
> se. It's actually to measure the entropy of the data.

> I don't really care how much compression it can maximally achieve.

(snip)

If you can find the (low) entropy then you can compress the data.
The hard part, usually, is finding it. For an array of floating
point numbers it seems, most likely, that you would find it in
terms or repititions. That is, other places in the file with exactly
the same value. Other than that, it will be hard to find unless
you know the source.

Say, for example, you have a file of sin(n) (in radians) for integer n
from zero to (some large number). Now, that has fairly low entropy
with the assumption that you have a good sin() routine available, but
it will be difficult for a program that doesn't know that the file
is likely to have sin(n) in it to find it.

If someone tries a Fourier transform on the data then they might
discover the pattern. As the result might not be exact, one would
code an approximation and then list the (must smaller) difference
between the two data sets.

Continuing, the output of a linear-congruential random number
generator is also easy to predict if you know the constants of
the generator. If you don't, and you have a big enough sample,
then you can likely find the pattern. (If you have the bits
exactly, though I am not sure how long it would take.)

If you have, say, sin() of the linear-congruential number
stream then it is likely much more difficult.

-- glen

From: Mark Shore on 4 Apr 2010 15:20

Luna Moon <lunamoonmoon(a)gmail.com> wrote in message <f03d83a5-9b33-4b0a-95ab-5e962650ebee(a)v16g2000vba.googlegroups.com>...
> On Apr 3, 8:29 pm, "Mark Shore" <msh...(a)magmageosciences.ca> wrote:
> > Luna Moon <lunamoonm...(a)gmail.com> wrote in message <205a603e-cc38-4088-8d39-5d5b8464a...(a)d34g2000vbl.googlegroups.com>...
> > > Hi all,
> >
> > > I have a vector of real numbers in Matlab. How do I compress them? Of
> > > course this has to be lossless, since I need to be able to recover
> > > them.
> >
> > > The goal is to study the Shannon rate and entropy of these real
> > > numbers, so I decide to compress them and see how much compression
> > > ratio I can have.
> >
> > > I don't need to write the result into compressed files, so those
> > > headers, etc. are just overhead for me which affect me calculating the
> > > Entropy... so I just need a bare version of the compress ratio...
> >
> > > Any pointers?
> >
> > > Thanks a lot!
> >
> > An exceeding simple test involving little or no effort on your part would be to take representative binary files and compress them with off-the-shelf utilities such as WinZip or 7-Zip.
> >
> > This would certainly give you some idea of what level of lossless compression you can expect from reasonably well-tested and mature algorithms before you try to adapt your own.
>
>
> Thanks a lot folks.
>
> Please remember the goal is not to compress the floating numbers per
> se. It's actually to measure the entropy of the data.
>
> I don't really care how much compression it can maximally achieve.
>
> Using WinZip is a great idea, however, I am looking for
>
> (1) a command inside Matlab;
> (2) a bare-bone compression, without the header info, etc. in Winzip,
> because those are overheads in terms of measuring entropy...
>
> Any more thoughts?
>
> Thank you!

I'm not aware off the top of my head what built-in commands or third-party tools might be available in MATLAB. You did make your overall goal clear in your first posts, so I was suggesting file compression utilities as an indirect measure of the entropy of a given data set.

This can work if the data set is large enough. For example, as a test I just compressed a binary 1591200x15 matrix of double-precision values representing a time series of 24-bit measurements from an array of magnetometers. WinZip compresses the original 190,944,400 byte file to 34,162,706 bytes using its maximum compression setting. An equal size binary array filled with pseudorandom numbers compressed to 179,929,479 bytes using the same setting. This difference seems reasonable given the higher entropy of the random set.

If you are dealing with very small files, then agreed, any file compression/decompression header overhead would likely make this less useful.

From: robert bristow-johnson on 4 Apr 2010 20:23

On Apr 4, 1:58 pm, glen herrmannsfeldt <g...(a)ugcs.caltech.edu> wrote:
....
> Continuing, the output of a linear-congruential random number
> generator is also easy to predict if you know the constants of
> the generator.

yeah, i guess you need a couple of constants and the initial seed
value. but don't you also need to somehow encode the rng algorithm,
too?

> If you don't, and you have a big enough sample,
> then you can likely find the pattern. (If you have the bits
> exactly, though I am not sure how long it would take.)
>
> If you have, say, sin() of the linear-congruential number
> stream then it is likely much more difficult.

it will look different in a histogram. suppose the rng was scaled to
be uniformly distributed over a segment as long as any multiple of
2pi, then the p.d.f. would go up as it approaches +1 or -1.

r b-j

First | Prev | Next | Last
Pages: 1 2 3
Prev: assign sphere output to a field in a struct
Next: GUI-update curve everytime