Prev: Sound-synched movie
Next: Energy Saving Tips
From: Hans-Georg Michna on 5 Jul 2010 03:01 On Sun, 4 Jul 2010 17:01:36 +0100, Dr J R Stockton wrote: >In comp.lang.javascript message <ev1v26tr7fv7buh7j0herp1gvs6a5ljlm8(a)4ax. >com>, Sat, 3 Jul 2010 21:03:39, Hans-Georg Michna <hans- >georgNoEmailPlease(a)michna.com> posted: >>The JavaScript program, among other things, contains a string >>literal, which contains an umlaut, and dynamically puts the >>string into an HTML tag. But the umlaut is not displayed >>properly and displays as a little square box instead. What could >>be the cause of this problem? >Is that a naked umlaut, or is it sitting over a well-known vowel? In this particular case it's an �, which I could encode as ü, but don't want to and should not have to. >You can add test code to the page, with charCodeAt, to see exactly what >is delivered to the browser (refer to the UniCode site for official >tables); Ah, another good idea. Some test code may be in order. >if that delivery is wrong, you can encode the character as >\uhhhh or like ä in your source script. I'm aware of that, but I don't see why I should go through all the already properly encoded js files, only to solve a problem that should already be solved through UTF-8. >Perhaps more likely, your viewer does not have that character in the >current font; add test code to display the string also in popular fonts. Unlikely, but I've thought of that too. Will dig further into the problem. Hans-Georg
From: Richard Cornford on 5 Jul 2010 04:06 Hans-Georg Michna wrote: > On Sat, 3 Jul 2010 20:36:00 +0100, Richard Cornford wrote: >>Hans-Georg Michna wrote: > >>> I'm having a problem with a UTF-8 HTML page containing a >>> <script> tag that calls in a JavaScript file that is also >>> encoded in UTF-8. >>> >>> The JavaScript program, among other things, contains a >>> string literal, which contains an umlaut, and dynamically >>> puts the string into an HTML tag. But the umlaut is not >>> displayed properly and displays as a little square box >>> instead. What could be the cause of this problem? >>> >>> Am I right in assuming that a JavaScript file inserted by >>> means of the <script> tag is interpreted as being encoded >>> in the same character set as the HTML page itself? > >> Without a reference to an HTML spec saying as much that >> would be an assumption, although not an unreasonable one >> as it would be a sensible strategy. Though I would expect >> the above description to assert that you have examined the >> HTML traffic (using an HTTP monitor/proxy such as Fiddler >> or Charles) and verified first that the javascript is being >> served to appropriate content type headers (either asserting >> UTF-8, or at least not contradicting it), > > Thanks for responding! > > I've looked at the HTTP header from the server, using Firebug, > and it specifies UTF-8. I wouldn't trust firebug to do that job. What I would want to look at is the traffic going into the browser (and coming out of it in other cases), while firebug is inside the browser and may be downstream of some assumptions/defaulting applied by the browser (and may not be reliable to start with). >> and second, that the actual bytes being sent includes the >> correct sequence of bytes for the UTF-8 encoding of the >> offending character (by looking at the hex representation >> of the resource in the HTTP monitor). > > Have yet to do this. > >> If nothing else, trying the SCRIPT element with an explicit >> CHARSET attribute (asserting UTF-8) might prove instructive. > > Will try that too, although it seems strange that I should have > to do that. You shouldn't have to do that. The HTTP headers should take precedence anyway. I proposed it as an experiment; to see if it changed anything. Richard.
From: Hans-Georg Michna on 5 Jul 2010 04:19 On Sun, 4 Jul 2010 14:11:54 +0300, Jukka K. Korpela wrote: >On general grounds, we can expect that browsers honor the encoding >information in HTTP headers. However, if the Javascript resource is served >as application/javascript, then it's supposed to be binary, with all >encoding issues resolved within the binary format. > >There's RFC 4329, "Scripting media types", but it is classified as >Informational, despite its language that refers to "requirements" and even >uses "MUST". And it illogically defines a charset parameter for >application/javascript. Oh well. > >For text/javascript, declared as "obsolete" by the informational RFC, the >charset parameter is much more logical. And it seems that browsers honor it. Here's an intermediate information, the response header issued by the server when it delivers the JavaScript file: Response Headers view source Date Mon, 05 Jul 2010 07:43:32 GMT Server IBM_HTTP_Server/6.1.0.19 Apache/2.0.47 (Unix) Last-Modified Mon, 14 Jun 2010 12:34:48 GMT Content-Length 2676 Keep-Alive timeout=10, max=99 Connection Keep-Alive Content-Type application/x-javascript Content-Language de-DE The Websphere server does, in fact, seem to use Apache. >On the practical side, if you work with a typical Apache server environment, >you should put e.g. >AddType text/javascript;charset=utf-8 js >or >AddType application/javascript;charset=utf-8 js >in the .htaccess file in the directory that contains your .js file, if they >are actually utf-8 encoded. (Note that Ascii files are trivially a special >case of utf-8 encoded files, but ISO-8859-1 files containing any non-Ascii >data are not.) Yes, I'm aware of this. Will have to find out whether Websphere still honors .htaccess files and whether that's the best way to go. >Using the charset="..." parameter in the <script> file is possible, too, but >it cannot override the encoding information in HTTP headers, if present. On >the other hand, it can be useful when a page has been saved locally - so >that when the page is opened, there will be no HTTP headers. Will also test charset declarations in the script tags, because they may turn out to be the most fundamental and reliable solution. It will take a little more time, because currently the test environment is not working. >(Note that <meta> tags can only specify the encoding on an HTML document. If >this affects the default encoding used for an external resource, then that's >something in the realm of actual browser behavior, not specifications.) Yes, am aware of that now too. Thanks! Hans-Georg
From: Richard Cornford on 5 Jul 2010 04:33 Hans-Georg Michna wrote: > On Sun, 4 Jul 2010 14:11:54 +0300, Jukka K. Korpela wrote: <snip> >> There's RFC 4329, "Scripting media types", but it is >> classified as Informational, despite its language that >> refers to "requirements" and even uses "MUST". And it >> illogically defines a charset parameter for >> application/javascript. Oh well. >> >> For text/javascript, declared as "obsolete" by the >> informational RFC, the charset parameter is much more >> logical. And it seems that browsers honor it. > > Here's an intermediate information, the response header issued > by the server when it delivers the JavaScript file: > > Response Headers view source > Date Mon, 05 Jul 2010 07:43:32 GMT > Server IBM_HTTP_Server/6.1.0.19 Apache/2.0.47 (Unix) > Last-Modified Mon, 14 Jun 2010 12:34:48 GMT > Content-Length 2676 > Keep-Alive timeout=10, max=99 > Connection Keep-Alive > Content-Type application/x-javascript > Content-Language de-DE <snip> Didn't you just tell me that "I've looked at the HTTP header from the server, using Firebug, and it specifies UTF-8"? If these are the headers where is that specification of UTF-8? Richard.
From: Richard Cornford on 5 Jul 2010 04:44
Hans-Georg Michna wrote: > On Sun, 4 Jul 2010 17:01:36 +0100, Dr J R Stockton wrote: >> Hans-Georg Michna wrote: > >>> The JavaScript program, among other things, contains a >>> string literal, which contains an umlaut, and dynamically >>> puts the string into an HTML tag. But the umlaut is not >>> displayed properly and displays as a little square box >>> instead. What could be the cause of this problem? > >> Is that a naked umlaut, or is it sitting over a well-known >> vowel? > > In this particular case it's an �, which I could encode as > ü, but don't want to and should not have to. <snip> Not in an imported javascript file, as they are not processed through an HTML parser. Or does this represent ground for suspecting the involvement of the use of - innerHTML - and so introducing the question of how - innerHTML - is interpreting the characters it is being presented with by javascript? Richard. |