Prev: Sound-synched movie
Next: Energy Saving Tips
From: Hans-Georg Michna on 3 Jul 2010 15:03 I'm having a problem with a UTF-8 HTML page containing a <script> tag that calls in a JavaScript file that is also encoded in UTF-8. The JavaScript program, among other things, contains a string literal, which contains an umlaut, and dynamically puts the string into an HTML tag. But the umlaut is not displayed properly and displays as a little square box instead. What could be the cause of this problem? Am I right in assuming that a JavaScript file inserted by means of the <script> tag is interpreted as being encoded in the same character set as the HTML page itself? If so, then I have to search for the error elsewhere. I haven't gotten to any more thorough analysis yet. Thought I should ask here first, just in case there are a few well-known potential causes. Hans-Georg
From: Richard Cornford on 3 Jul 2010 15:36 Hans-Georg Michna wrote: > I'm having a problem with a UTF-8 HTML page containing a > <script> tag that calls in a JavaScript file that is also > encoded in UTF-8. > > The JavaScript program, among other things, contains a > string literal, which contains an umlaut, and dynamically > puts the string into an HTML tag. But the umlaut is not > displayed properly and displays as a little square box > instead. What could be the cause of this problem? > > Am I right in assuming that a JavaScript file inserted by > means of the <script> tag is interpreted as being encoded > in the same character set as the HTML page itself? Without a reference to an HTML spec saying as much that would be an assumption, although not an unreasonable one as it would be a sensible strategy. Though I would expect the above description to assert that you have examined the HTML traffic (using an HTTP monitor/proxy such as Fiddler or Charles) and verified first that the javascript is being served to appropriate content type headers (either asserting UTF-8, or at least not contradicting it), and second, that the actual bytes being sent includes the correct sequence of bytes for the UTF-8 encoding of the offending character (by looking at the hex representation of the resource in the HTTP monitor). > If so, then I have to search for the error elsewhere. > > I haven't gotten to any more thorough analysis yet. Thought I > should ask here first, just in case there are a few well-known > potential causes. If nothing else, trying the SCRIPT element with an explicit CHARSET attribute (asserting UTF-8) might prove instructive. Richard.
From: johncoltrane on 3 Jul 2010 15:59 Le 03/07/10 21:03, Hans-Georg Michna a écrit : > I'm having a problem with a UTF-8 HTML page containing a > <script> tag that calls in a JavaScript file that is also > encoded in UTF-8. > > The JavaScript program, among other things, contains a string > literal, which contains an umlaut, and dynamically puts the > string into an HTML tag. But the umlaut is not displayed > properly and displays as a little square box instead. What could > be the cause of this problem? > > Am I right in assuming that a JavaScript file inserted by means > of the<script> tag is interpreted as being encoded in the same > character set as the HTML page itself? If so, then I have to > search for the error elsewhere. > > I haven't gotten to any more thorough analysis yet. Thought I > should ask here first, just in case there are a few well-known > potential causes. > > Hans-Georg AFAIK JavaScript is supposed to be UTF-8 compatible. You can even use japanese hiragana as variable names. I just ran a few quick tests in Firefox with the factory default charset (iso-8859-1). relevant HTML: <script src="js.js" type="text/javascript"></script> (no charset) or <script src="js.js" type="text/javascript" charset="utf-8"></script> and <body onload="init();"> <p id="txt"></p> </body> relevant JS: function init() { ぢ = "✍xvbc;,wxjhgdkqsj¬fiÌÏfiƒ¬Ò÷ß∂ƒÒÈ∂ºÒ̃ßÒ÷È∂ƒßȺ∂Ì≠¬ÏîÂÏ"; document.getElementById('txt').innerHTML = ぢ; }; page charset | script charset | var ぢ = 'ü' | var txt = 'ü' -------------+----------------+--------------+-------------- none | none | parse error | garbled glyphs none | utf-8 | works | works utf-8 | none | works | works utf-8 | utf-8 | works | works iso-8859-1 | none | parse error | garbled glyphs iso-8859-1 | utf-8 | works | works Soooo... I'm not sure why you would get a garbled glyph if at least the HTML document is in utf-8. -- (ôlô)
From: Thomas 'PointedEars' Lahn on 3 Jul 2010 17:05 johncoltrane wrote: > AFAIK JavaScript is supposed to be UTF-8 compatible. You know nonsense; partially because you don't know what JavaScript is, partially because you don't know what UTF-8 is. ,-[ECMAScript Language Specification, Edition 5 Final Draft] | | A conforming implementation of this International standard shall interpret | characters in conformance with the Unicode Standard, Version 3.0 or later | and ISO/IEC 10646-1 with either UCS-2 or UTF-16 as the adopted encoding | form, implementation level 3. If the adopted ISO/IEC 10646-1 subset is not | otherwise specified, it is presumed to be the BMP subset, collection 300. | If the adopted encoding form is not otherwise specified, it presumed to be | the UTF-16 encoding form. The key phrase here being "If the adopted encoding form is not otherwise specified". See below. > You can even use japanese hiragana as variable names. That is a subset of a character set (Unicode), not an encoding (UTF-8). Learn to understand the difference. > I just ran a few quick tests in Firefox with the factory default charset Nonsense. Obviously you don't know what "charset" means to begin with. > (iso-8859-1). That is a character encoding, and its being the *HTTP default* in reality is heavily overrated. And there is *no* default value for the `charset' attribute specified in HTML. > relevant HTML: > > <script src="js.js" type="text/javascript"></script> (no charset) > or > <script src="js.js" type="text/javascript" charset="utf-8"></script> As specified, HTTP header information and “A META declaration with "http- equiv" set to "Content-Type" and a value set for "charset"” take precedence over this attribute and related attributes. <http://www.w3.org/TR/REC-html40/charset.html> > Soooo... I'm not sure why you would get a garbled glyph if at least the > HTML document is in utf-8. Because one has nothing to do with the other. It is the declaration of the encoding of the resources in the HTTP Content-Type header (no, _not_ meta) that matters most; everything else only matters if it is *missing*. And there are still some stupid server administrators that have `Content-Type: ....; charset=ISO-8859-1' sent by default (a default configuration bug that was fixed for Apache years ago¹). Learn to quote. PointedEars ___________ ¹ <https://issues.apache.org/bugzilla/show_bug.cgi?id=23421> -- Danny Goodman's books are out of date and teach practices that are positively harmful for cross-browser scripting. -- Richard Cornford, cljs, <cife6q$253$1$8300dec7(a)news.demon.co.uk> (2004)
From: johncoltrane on 3 Jul 2010 18:18
>> AFAIK JavaScript is supposed to be UTF-8 compatible. > > You know nonsense; partially because you don't know what JavaScript is, > partially because you don't know what UTF-8 is. > > ,-[ECMAScript Language Specification, Edition 5 Final Draft] > | > | A conforming implementation of this International standard shall interpret > | characters in conformance with the Unicode Standard, Version 3.0 or later > | and ISO/IEC 10646-1 with either UCS-2 or UTF-16 as the adopted encoding > | form, implementation level 3. If the adopted ISO/IEC 10646-1 subset is not > | otherwise specified, it is presumed to be the BMP subset, collection 300. > | If the adopted encoding form is not otherwise specified, it presumed to be > | the UTF-16 encoding form. > > The key phrase here being "If the adopted encoding form is not otherwise > specified". See below. > >> You can even use japanese hiragana as variable names. > > That is a subset of a character set (Unicode), not an encoding (UTF-8). > Learn to understand the difference. I know the difference. It was an example : variable names in non-ascii characters do work in... that mostly browser centric scripting language. Think of it as a preemptive illustration of your rebuttal. >> I just ran a few quick tests in Firefox with the factory default charset > > Nonsense. Obviously you don't know what "charset" means to begin with. > >> (iso-8859-1). > > That is a character encoding, and its being the *HTTP default* in reality > is heavily overrated. And there is *no* default value for the `charset' > attribute specified in HTML. Well, what I know is that when talking about HTML, the difference between "character set" and "encoding" is practically non-existent, both words being used (wrongly, I give you that) interchangeably. Also I was referring to the default settings of Firefox, here. HTML has only the "charset" attribute and it's not supposed to accept "Unicode" or "Hiragana" or "Occidental" as value. We are left with "utf-8" (the most widely used way of representing the full/most of the Unicode standard, including Hiragana) or "iso-8859-1" or a slew of other possibilities. Hell, in XML/XHTML we even have to use both terms. > As specified, HTTP header information and “A META declaration with "http- > equiv" set to "Content-Type" and a value set for "charset"” take precedence > over this attribute and related attributes. > > <http://www.w3.org/TR/REC-html40/charset.html> I thought it was possible to override the HTTP header with in-document declarations. Thanks. >> Soooo... I'm not sure why you would get a garbled glyph if at least the >> HTML document is in utf-8. > > Because one has nothing to do with the other. It is the declaration of the > encoding of the resources in the HTTP Content-Type header (no, _not_ meta) > that matters most; everything else only matters if it is *missing*. And > there are still some stupid server administrators that have `Content-Type: > ...; charset=ISO-8859-1' sent by default (a default configuration bug that > was fixed for Apache years ago¹). Yes. I can't even remember ever seeing this default. But I'm not an old-timer. That said "HTML document is in utf-8" was too unspecific. I was thinking about the HTTP header. Sorry. > Learn to quote. Like that? -- (ôlô) |