Prev: Using an Asymmetric cipher the opposite way round.
Next: Note on general application of Feistel's technique
From: Paulo Marques on 23 Jul 2010 14:06 Fiziwig wrote: > On Jul 23, 5:25 am, Paulo Marques <pmarq...(a)grupopie.com> wrote: >[...] >> This means that for a 26-ary tree, the number of starting elements needs >> to be "25 * N + 26" for some integer N. If they're not, you need to add >> dummy zero frequency elements to the end of the tree before start >> building it. > > I looked it up. The number of starting elements needs to be congruent > to 1 mod n-1, so it has to be of the form 25X + 1. My math didn't fail me too much, then :) >[...] > Although I don't see > what could be wrong. It's simply the total number of code letters used > divided by the total of all word frequencies, and my original corpus > was slightly more than a million words, so that number looks right. > I'll double check it. It should be something like: sum_for_all_words(frequency * code_letters) / sum_for_all_words(frequency). I.e. the total number of letters used to encode the corpus divided by to total number of words. This should give the average letters per word used to encode the complete corpus. If you need help debugging the code, you can send it to me privately. I'm usually good at spotting other people's bugs. I just wish I could use that superpower for my own programs :( -- Paulo Marques - www.grupopie.com "C++ : increment the value of C and use the old one"
From: Fiziwig on 23 Jul 2010 15:08 On Jul 23, 11:06 am, Paulo Marques <pmarq...(a)grupopie.com> wrote: > > It should be something like: sum_for_all_words(frequency * code_letters) > / sum_for_all_words(frequency). I.e. the total number of letters used to > encode the corpus divided by to total number of words. This should give > the average letters per word used to encode the complete corpus. > > If you need help debugging the code, you can send it to me privately. > I'm usually good at spotting other people's bugs. I just wish I could > use that superpower for my own programs :( I found it. In my recursive display function I misplaced one line of code so I was taking "strlen( tag )* lpScan->weight" before I appended the final letter for this branch to the tag, so I ended up counting the length of all the tags as one less than they should have been. I fixed that and got: Total Words 1075617 Total Letters 2321741 Average letters per word 2.16 SO overall, Huffman gets 2.16 vs my hand-made 2.35, or 8% improvement. But more important, I learned a lot by doing this exercise. :) BTW: as an alternative for making pronounceable codes, I discovered the best approach is to build codes purely out of consonants, and then add any old vowels you please when you use the codes. The human ear is better at picking harmonious vowels than any program could be. So PTN could be pronounced "patuma", or "aputiamu", or whatever you like, without disturbing the self-segregating property. Adding a few rules like "X" = "sh", "C" = "ch", and "Q"="th" makes even oddballs like XQ, and CCN easy: "shathu", "chachani". Move over Apache Code Talkers. You have met your match. :) --gary
From: MrD on 24 Jul 2010 03:45 Fiziwig wrote: > So PTN could be pronounced "patuma", or "aputiamu", or whatever you > like, More like "potion" or "patina" or "epitonia", but not "Opountia". If I understand you correctly. -- MrD.
From: rossum on 24 Jul 2010 08:01 On Fri, 23 Jul 2010 12:08:08 -0700 (PDT), Fiziwig <fiziwig(a)gmail.com> wrote: >Move over Apache Code Talkers. I thought they were Navaho Code Talkers, or were the Apache used as well? rossum
From: Fiziwig on 24 Jul 2010 10:20
On Jul 24, 5:01 am, rossum <rossu...(a)coldmail.com> wrote: > On Fri, 23 Jul 2010 12:08:08 -0700 (PDT), Fiziwig <fizi...(a)gmail.com> > wrote: > > >Move over Apache Code Talkers. > > I thought they were Navaho Code Talkers, or were the Apache used as > well? > > rossum I looked it up. I stand corrected. They were Navajo, Cherokee, Choctaw and Comanche. No Apache. Even at my advanced age I learn something new every day. :) --gary |