From: Luna Moon on 4 Apr 2010 11:27 On Apr 3, 8:29 pm, "Mark Shore" <msh...(a)magmageosciences.ca> wrote: > Luna Moon <lunamoonm...(a)gmail.com> wrote in message <205a603e-cc38-4088-8d39-5d5b8464a...(a)d34g2000vbl.googlegroups.com>... > > Hi all, > > > I have a vector of real numbers in Matlab. How do I compress them? Of > > course this has to be lossless, since I need to be able to recover > > them. > > > The goal is to study the Shannon rate and entropy of these real > > numbers, so I decide to compress them and see how much compression > > ratio I can have. > > > I don't need to write the result into compressed files, so those > > headers, etc. are just overhead for me which affect me calculating the > > Entropy... so I just need a bare version of the compress ratio... > > > Any pointers? > > > Thanks a lot! > > An exceeding simple test involving little or no effort on your part would be to take representative binary files and compress them with off-the-shelf utilities such as WinZip or 7-Zip. > > This would certainly give you some idea of what level of lossless compression you can expect from reasonably well-tested and mature algorithms before you try to adapt your own. Thanks a lot folks. Please remember the goal is not to compress the floating numbers per se. It's actually to measure the entropy of the data. I don't really care how much compression it can maximally achieve. Using WinZip is a great idea, however, I am looking for (1) a command inside Matlab; (2) a bare-bone compression, without the header info, etc. in Winzip, because those are overheads in terms of measuring entropy... Any more thoughts? Thank you!
From: Jerry Avins on 4 Apr 2010 12:36 On 4/4/2010 12:52 AM, robert bristow-johnson wrote: > On Apr 3, 7:13 pm, John<sampson...(a)gmail.com> wrote: > >> Nobody in here has a sense of humor > > i got it. your file compression scheme sorta rearranged the order of > data. > > but yer right. no sense of humor to be found here. Didn't it strike you as odd than any file of floats, however long, could be _reversibly_ compressed to two integers? Jerry -- "It does me no injury for my neighbor to say there are 20 gods, or no God. It neither picks my pocket nor breaks my leg." Thomas Jefferson to the Virginia House of Delegates in 1776. ���������������������������������������������������������������������
From: Jerry Avins on 4 Apr 2010 12:41 On 4/4/2010 1:51 AM, Vladimir Vassilevsky wrote: > > > robert bristow-johnson wrote: > >> On Apr 3, 7:13 pm, John <sampson...(a)gmail.com> wrote: >> >> >>> Nobody in here has a sense of humor >> >> >> i got it. your file compression scheme sorta rearranged the order of >> data. >> >> but yer right. no sense of humor to be found here. > > JFYI: > > http://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transform I didn't spend enough time on it to see how the input is recovered, Jerry -- "It does me no injury for my neighbor to say there are 20 gods, or no God. It neither picks my pocket nor breaks my leg." Thomas Jefferson to the Virginia House of Delegates in 1776. ���������������������������������������������������������������������
From: glen herrmannsfeldt on 4 Apr 2010 13:58 In comp.dsp Luna Moon <lunamoonmoon(a)gmail.com> wrote: (snip) > Please remember the goal is not to compress the floating numbers per > se. It's actually to measure the entropy of the data. > I don't really care how much compression it can maximally achieve. (snip) If you can find the (low) entropy then you can compress the data. The hard part, usually, is finding it. For an array of floating point numbers it seems, most likely, that you would find it in terms or repititions. That is, other places in the file with exactly the same value. Other than that, it will be hard to find unless you know the source. Say, for example, you have a file of sin(n) (in radians) for integer n from zero to (some large number). Now, that has fairly low entropy with the assumption that you have a good sin() routine available, but it will be difficult for a program that doesn't know that the file is likely to have sin(n) in it to find it. If someone tries a Fourier transform on the data then they might discover the pattern. As the result might not be exact, one would code an approximation and then list the (must smaller) difference between the two data sets. Continuing, the output of a linear-congruential random number generator is also easy to predict if you know the constants of the generator. If you don't, and you have a big enough sample, then you can likely find the pattern. (If you have the bits exactly, though I am not sure how long it would take.) If you have, say, sin() of the linear-congruential number stream then it is likely much more difficult. -- glen
From: glen herrmannsfeldt on 4 Apr 2010 14:42 Jerry Avins <jya(a)ieee.org> wrote: (snip) >> http://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transform > I didn't spend enough time on it to see how the input is recovered, Page has a pretty easy to follow explanation of the potential to recover the input. It isn't so obvious from that, though, that it could be done fast. The quick explanation (but not necessarily implementation) of the forward transform is to take all cyclic permutations of the input string, sort the resulting list, and output the last character of each. It is easy to see that to get the first character of such sorted list, you only need to sort the list of last characters. If you keep track of the last character while sorting, you now have character pairs. Repeat the process of prepending the characters and sorting until you have the complete sorted list of cyclic permutations. The only additional thing you need is to know where in the list the first character of the input stream was. In the wiki articly they use a special character for that, but I believe in the actual transform they just remember where it was. That shows that the transform can be reversed, not necessarily the fastest way to do it. -- glen
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 Prev: upsampling question Next: Decode 1394 data to 4 I2S audio data |