From: Paul E. Schoen on 17 May 2010 18:54 "Dr J R Stockton" <reply1020(a)merlyn.demon.co.uk> wrote in message news:OwFC6XIYZY8LFwUJ(a)invalid.uk.co.demon.merlyn.invalid... > In comp.lang.javascript message <M6GHn.11586$Gx2.2053(a)newsfe20.iad>, > Sat, 15 May 2010 19:28:30, Paul E. Schoen <paul(a)pstech-inc.com> posted: > >> >>I had always thought Base64 was used just to be able to send binary >>data in text form, without control characters. > > Yes. > >> Compression was a separate process. There seem to be 94 ASCII >>characters (excluding Space) that could be used for encoding. If all 8 >>bits of an unsigned character could be used, a base 128 would be >>possible, and it should be twice as efficient as base64. > > No; a 7-bit character only contains about 15% more information than a > 6-bit one. I thought about that after I wrote and sent the post. The character contains twice as many bits (128/64), but only an increase of 7/6 or 1.1667. Paul
From: Spamless on 18 May 2010 10:51 On 2010-05-17, Paul E. Schoen <paul(a)pstech-inc.com> wrote: > I thought about that after I wrote and sent the post. The character contains > twice as many bits (128/64), but only an increase of 7/6 or 1.1667. Can represent twice as many values but only has one more bit.
From: Spamless on 18 May 2010 14:16 On 2010-05-15, Paul E. Schoen <paul(a)pstech-inc.com> wrote: > > "nick" <nick___(a)fastmail.fm> wrote in message > news:75b38d32-b2f9-4445-97cb-3e63e2da3f1f(a)r34g2000yqj.googlegroups.com... >>I wanted to try making a javascript compressor (more specifically, >> implement a compression engine in javascript, not necessarily for >> compressing js) ... but then I found this: >> >> http://marklomas.net/ch-egg/articles/lzwjs.htm >> >> So what do you guys think? Is this any better than base 64? It sure as >> hell is ugly :) > > I had always thought Base64 was used just to be able to send binary data in > text form, without control characters. Compression was a separate process. > There seem to be 94 ASCII characters (excluding Space) that could be used > for encoding. It is easier (no math) to encode if the base is a power of two since we are working with binary. Take the least common multiple of 8 (8 bits, a byte) and the number of bits in the base (6 for base 64) and take 6 bytes to encode in 8 base64 digits just by taking the right bits: BITS FOR EIGHT BASE 64 DIGITS: 123456123456123456123456123456123456123456123456 48 BITS xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx BITS FOR SIX BASE 256 CHARACTERS: 123456781234567812345678123456781234567812345678 Which characters? 94? Be careful of characters which may not pass through mail systems and ones which can be munged. UUENCODING was a base64 method for unix-unix transfers of data over plain text lines but ... using some IBM systems as relays which used EBCDIC character encoding resulted in corruption. XXENCODING (I believe) was another base64 encoding changing the characters used as the base64 "digits" set. They had line starters, terminators, checksums(?). MIME base64 uses 64 digits guaranteed(?) to work (well, maybe not on APPLE I's which only had upper case characters?). YYENCODING is now often used in newsgroups. It is pretty much raw 8 bit since "modern" news servers can handle it. There is some encoding, but it expands the size very little. Of course, if you save such a file and try to email it ... it may well fail (email systems may be much more sensitive) and trying to use one of those back on the old GEnie system (7 bit, even parity) would fail. FTPing such a file using Windows (is the default still TEXT instead of IMAGE/BINARY) may strip off the high bits. Base64 encodings expand the file size by 33% (assuming you have to use 8 bits to send one of the 64 base64 "digits") (compress the file first!). You don't want the overhead of changing bases if it involves mathematical operations (divide, etc.) or manipulation to avoid certain sensitive characters. You just want a simple (fast, easy) selection (of bits).
From: Spamless on 18 May 2010 15:11 On 2010-05-18, Spamless <Spamless(a)Nil.nil> wrote: > Which characters? 94? While using a base which is a power of two makes life very easy, consider using a base where b^n is a little larger than a power of two (rather than equal to a power of two). For example, 10^3 is about 2^10 so one can almost encode all 10 bit sequences as three base 10 digits, but not quite as 10^3 is a bit smaller than 2^10. 90^2=8100, 2^13=8192 That's a bit small. Better would be base 91. 91^2=8281, 2^13=8192 You can encode any sequence of thirteen bits as a pair of base 91 digits. If you choose 91 "safe" characters to use to represent those digits you can encode any sequence of 13 bits as a pair of those digits. If it takes 8 bits to send each such character/digit, you can send any sequence of 13 bits as those two characters (16 bits) (though you will not use up all the possible pairs which can handle numbers up through 8281 - you can use those other pairs for control). This would be an increase of (16-13)/13=23% (instead of the 33% of base 64 encoding). But how do you convert a number (from 0 to 2^13-1) to base 91 (it will take at most two "digits")? First extract out the 13 bits (take 13 original bytes and split them into 8 sequences of 13 bits each). Convert each sequence of 13 bits into to base 91 digits and send those (16 bits to send them). Convert to base 91? Divide by 91 (integer division) to find the high digit and take it mod 91 to get the low base 91 digit. Which 91 characters are guaranteed to be safe on various systems (including legacy systems using EBCDIC encoding of which there are variants)? Of course for any integer which is not a power of two, ln(b)/ln(2) is irrational (even transcendental!) and so there are some integers k,m so that k*ln(b) is as close as you want but larger than m*ln(2) and you can use k digits in base b to represent m digits in base 2. How about base 83? A continued fraction expansion of ln(83)/ln(2) shows that a good choice would be 8 base 83 digits to encode 51 bits: 83^8 = 2252292232139041 2^51 = 2251799813685248 and so one can encode 51 bits as 8 base 83 digits and assuming that sending each digit takes 8 bit, you have to send 64 bits (instead of 51) for an increase of 25.5% (if there aren't 91 "safe" characters to use as digits but there are 83, one encode with an overhead of 25.5% instead of base 64's 33.3%). Of course, in this case one has to convert a number from 0 through 2251799813685248-1 to base 83.
First
|
Prev
|
Pages: 1 2 Prev: Between hard code writing hours Next: How to Pump $1,000s in CASH & Checks to your door. |