From: Johannes Baagoe on 23 May 2010 16:33 Johannes Baagoe : > No: "Où qu'il réside".charCodeAt(1) == 65533 Oops, sorry - that was with Rhino in a UTF-8 console. V8 says 249, which makes more sense. -- Johannes
From: Johannes Baagoe on 23 May 2010 16:54 nick : > Hmm... so I wonder how this passes the "Où qu'il réside" test? I think I figured it out, after all. > Were all of those char codes <= 256? Yes. Any character in the Latin-1 Supplement is represented by a number between 0x0080 and 0x00FF in UTF-16, which is what javascript uses. So, for French (except "œ" and "Œ"), Spanish, German, Portuguese, Danish and a few others, you should be all right. But it won't work with Greek, Russian or Chinese, and certainly not with Egyptian hieroglyphs which require *two* 16-bit char codes. -- Johannes
From: Thomas 'PointedEars' Lahn on 23 May 2010 19:44 Johannes Baagoe wrote: > nick : >> Were all of those char codes <= 256? > > Yes. Any character in the Latin-1 Supplement is represented by a number > between 0x0080 and 0x00FF in UTF-16, No, in Unicode. > which is what javascript uses. | A conforming [ECMAScript] implementation [...] shall interpret characters | in conformance with the Unicode Standard, Version 3.0 or later and | ISO/IEC 10646-1 with either UCS-2 or UTF-16 as the adopted encoding form, | implementation level 3. If the adopted ISO/IEC 10646-1 subset is not | otherwise specified, it is presumed to be the BMP subset, collection 300. | If the adopted encoding form is not otherwise specified, it presumed to be | the UTF-16 encoding form. Learn the difference between character set and encoding. > So, for French (except "œ" and "Œ"), Spanish, German, Portuguese, Danish > and a few others, you should be all right. But it won't work with Greek, > Russian or Chinese, and certainly not with Egyptian hieroglyphs which > require *two* 16-bit char codes. Modern Greek, Cyrillic as used in Russian requires, and Han characters as they are used e.g. in Standard Mandarin usually require one _UTF-16 code unit_, but characters from CJK Extensions-B and -C, and Compatibility Ideographs Supplement require two of them. Egyptian hieroglyphs require two _UTF-16 code units_. This is however unrelated to the fact that their code points require at least two 16-bit words to be represented in binary. It is a misconception to think of UTF-8, UTF-16 or UTF-32 as encodings that combine char(acter) codes to represent another character. Learn the difference between characters and code units. <http://unicode.org/faq/> PointedEars -- var bugRiddenCrashPronePieceOfJunk = ( navigator.userAgent.indexOf('MSIE 5') != -1 && navigator.userAgent.indexOf('Mac') != -1 ) // Plone, register_function.js:16
From: Andrea Giammarchi on 24 May 2010 09:11 On May 23, 8:34 pm, David Mark <dmark.cins...(a)gmail.com> wrote: > > Packer is a complete waste of time. > you never waste an opportunity to be arrogant, don't ya? Dean' packer has been revolutionary by its time and it is still widely adopted, improved, maintained, regardless what *you* think. A bit more respect for those devs that have been always there teaching and explaining us with valid software and/or experiments would be probably more appropriate for this group, isn't it? Br, Andrea Giammarchi
From: Andrea Giammarchi on 24 May 2010 09:15
.... and btw, for the record, this press.js is a nice experiment as well. The "decompressor" uses lot of unnecessary spaces and notation but even if improved other guys already explained the side effect. The fact hosts do not allow gzip means nothing to me, you can gzip and deflate on build time then serve already gzipped/deflated files using proper headers so the host won't be anything different from serving just a file, and it won't be overloaded because of runtime compression. If you want an example, here one of my projects that does exactly what I have described: http://code.google.com/p/php-client-booster/ Best Regards, Andrea Giammarchi |