From: nick on 22 May 2010 17:10 I'd like to hear this group's reaction on a javascript compression script I've been working on. It uses the LZW algorithm and base85 encoding to squeeze large scripts down to size. Quick test... used this: http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js original size: 72173 compressed: 44782 You can test it here: http://pressjs.googlecode.com/svn/trunk/build/test.html Browse the source: http://code.google.com/p/pressjs/source/browse/#svn/trunk/src I'd love to hear what you guys think, esp. any way we could optimize it for speed or size, or if you catch any bugs / memory leaks / namespace pollution / stupid programming fails / etc. Thanks!
From: Sean Kinsey on 23 May 2010 07:43 On May 22, 11:10 pm, nick <nick...(a)fastmail.fm> wrote: > I'd like to hear this group's reaction on a javascript compression > script I've been working on. It uses the LZW algorithm and base85 > encoding to squeeze large scripts down to size. > > Quick test... > > used this:http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js > original size: 72173 > compressed: 44782 > > You can test it here:http://pressjs.googlecode.com/svn/trunk/build/test.html > > Browse the source:http://code.google.com/p/pressjs/source/browse/#svn/trunk/src > > I'd love to hear what you guys think, esp. any way we could optimize > it for speed or size, or if you catch any bugs / memory leaks / > namespace pollution / stupid programming fails / etc. Thanks! I'm sorry to say that your attempt to 'compress' code has failed. Did you ever take into consideration that gzip (used to served compressed files) also use LZW (and in a more efficient way than you are)? A quick test I did with an input file of 56.3KB: Direct compression using 7-Zip into a .gz archive = 12KB Compression using pressjs and then compressed into a .gz archive: 20.9KB And the same using a minified version of the same script Direct compression using 7-Zip into a .gz archive = 4.51KB Compression using pressjs and then compressed into a .gz archive: 7.68KB Not to mention the added overhead of having to decompress the file after the UA has downloaded the file. The only scenario where this method would be beneficial is where gzip is not used on the server, bad caching directives are used causing the file to be downloaded in full each time, and the extra time used downloading is higher than the extra time needed to decompress. Hopefully that isn't a too-common scenario. But hey, it was probably fun to create :)
From: Johannes Baagoe on 23 May 2010 09:53 nick : > http://pressjs.googlecode.com/svn/trunk/build/test.html "Où qu'il réside, même aux îles Caïmans, tout Français inscrit au rôle paiera son dû dès Noël" (length 92) "compresses" to 118 characters. > http://code.google.com/p/pressjs/source/browse/#svn/trunk/src From http://code.google.com/p/pressjs/source/browse/trunk/src/compdict.js : // Populate table with all possible character codes. for(var i = 0; i < 256; ++i) { var str = String.fromCharCode(i); this.hashtable[str] = this.nextcode++; } What about character codes >= 256? My general impression is that you are complicating things for no reason. Why use constructors, prototypes and fancy "//#" pseudo-cpp directives? Just one file which defines the two functions that compress and expand would be much easier both to write and to review. (I assume that you are doing this for fun, for the challenge of writing a compressor in javascript. If it is in order to reduce bandwidth in real applications on the Web, enabling gzip on the server is much more efficient.) -- Johannes
From: nick on 23 May 2010 13:20 On May 23, 7:43 am, Sean Kinsey <okin...(a)gmail.com> wrote: > I'm sorry to say that your attempt to 'compress' code has failed. Did > you ever take into consideration that gzip (used to served compressed > files) also use LZW (and in a more efficient way than you are)? Yeah, I thought about that but I figured the point of javascript compressors was that they would be used in environments where gzip compression on the server is not an option (many shared hosts, which many people seem content to use, for some reason don't use gzip). > A quick test I did with an input file of 56.3KB: > Direct compression using 7-Zip into a .gz archive = 12KB > Compression using pressjs and then compressed into a .gz archive: > 20.9KB > And the same using a minified version of the same script > Direct compression using 7-Zip into a .gz archive = 4.51KB > Compression using pressjs and then compressed into a .gz archive: > 7.68KB I wonder if encoding to base64 would yield better compression ratios afterwards? Maybe still not as good as using gzip on the uncompressed file though. I just did a similar test with Dean Edwards' "packer" with the "Base62 encode" and "Shrink variables" options on and it manages to get a similar gzip-compressed size to the gzip-compressed size of the original... If I can achieve a similar gzip-compressed size after pressing, I think this should be at least as useful as packer (not sure what this group's opinion of packer is, though). > Not to mention the added overhead of having to decompress the file > after the UA has downloaded the file. True, although the size overhead is only about 1200 bytes (and shrinking), and the processing overhead is negligible. > The only scenario where this method would be beneficial is where gzip > is not used on the server, bad caching directives are used causing the > file to be downloaded in full each time, and the extra time used > downloading is higher than the extra time needed to decompress. > Hopefully that isn't a too-common scenario. It's more common than you might think (shared hosting). > But hey, it was probably fun to create :) It was :) Thanks for the comments.
From: nick on 23 May 2010 13:41
On May 23, 9:53 am, Johannes Baagoe <baa...(a)baagoe.com> wrote: > nick : > > >http://pressjs.googlecode.com/svn/trunk/build/test.html > > "Où qu'il réside, même aux îles Caïmans, tout Français inscrit au rôle > paiera son dû dès Noël" (length 92) "compresses" to 118 characters. Well, you obviously used the wrong text. "banana cabana banana cabana banana cabana banana cabana banana cabana" (length 69) compresses to 44 characters! ;) > >http://code.google.com/p/pressjs/source/browse/#svn/trunk/src > > Fromhttp://code.google.com/p/pressjs/source/browse/trunk/src/compdict.js: > > // Populate table with all possible character codes. > for(var i = 0; i < 256; ++i) { > var str = String.fromCharCode(i); > this.hashtable[str] = this.nextcode++; > } > > What about character codes >= 256? I'm pretty sure those characters aren't allowed in a javascript document? I'm not really sure what's going on there though, I was puzzled by that bit as well. See my next paragraph. > My general impression is that you are complicating things for no reason. > Why use constructors, prototypes and fancy "//#" pseudo-cpp directives? > Just one file which defines the two functions that compress and expand > would be much easier both to write and to review. Yeah, that stuff is all part of another GPL program I ripped off to make this compressor, which in turn is a pretty much direct port of some c++ code, so it has a very class-like design. I've been going through and making it more object-based, and trying to learn the algorithm at the same time. Eventually I'd like to replace all of that code, but for now I just wanted to see if this whole idea was viable. Well, the cpp directives were my idea. I like to be able to separate the files into logical units, and ifdef comes in handy when building multiple similar-but-different targets (like stand-alone vs embedded decompressor). I'm definitely considering merging instream and outstream functionalities into the compressor / decompressor, but I think I like the dictionaries in separate files for now. > (I assume that you are doing this for fun, for the challenge of writing > a compressor in javascript. If it is in order to reduce bandwidth > in real applications on the Web, enabling gzip on the server is much > more efficient.) Yeah, I'm mostly doing it to see if it can be done. Next I want to experiment with a different compression algorithm or one of the variations on LZW. Server-side gzip is obviously the better alternative if it's available; however that's not always the case (see my response to Sean) and so we have things like "packer" and maybe this thing. |