From: Ashley Sheridan on 27 May 2010 12:13 On Thu, 2010-05-27 at 12:08 -0400, Adam Richardson wrote: > On Thu, May 27, 2010 at 9:45 AM, Guus Ellenkamp > <Ellenkamp_Guus(a)hotmail.com>wrote: > > > Thanks, but are you sure of that? I did some research a while ago and found > > that officially PHP files should be ascii and not have any specific > > character encoding. I believe it will work anyhow (did not try this one), > > but would like to stick with the standards. > > > > "Ashley Sheridan" <ash(a)ashleysheridan.co.uk> wrote in message > > news:1274883714.2202.228.camel(a)localhost... > > > On Wed, 2010-05-26 at 22:20 +0800, Guus Ellenkamp wrote: > > > > > >> We use PHP defines for defining text in different languages. As far as I > > >> know PHP files are supposed to be ASCII, not UTF-8 or something like > > >> that. > > >> What I want to make is a conversion program that would convert a given > > >> UTF-8 > > >> file with the format > > >> > > >> definetext1=this is a text in random UTF-8, probably arabic or similar > > >> text > > >> definetext2=this is another text in random UTF-8, probably arabic or > > >> similar > > >> text > > >> > > >> into a file with the following defines > > >> > > >> > > define('definetext1',chr(<t_value>).chr(<h_value>).chr(<i_value>)...<chr(<x_value>).chr(<t_value>)); > > >> > > define('definetext2,chr(<t_value>).chr(<h_value>).chr(<i_value>)...<chr(<x_value>).chr(<t_value>)); > > >> > > >> Not sure if I'm using the correct chr/ord function, but I hope the above > > >> is > > >> clear enough to make clear what I'm looking for. Basically the output > > >> file > > >> should be ascii and not contain any utf-8. > > >> > > >> Any advise? The html_special_chars did not seem to work for Vietnamese > > >> text > > >> I tried to convert, so something seems to get wrong with just reading an > > >> array of strings and converting the strings and putting them in defines. > > >> > > >> > > >> > > > > > > > > > PHP files can contain utf-8, and in-fact is the preference of most > > > developers I know of. > > > > > > Thanks, > > > Ash > > > http://www.ashleysheridan.co.uk > > > > > > > > > > > > > > > > > -- > > PHP General Mailing List (http://www.php.net/) > > To unsubscribe, visit: http://www.php.net/unsub.php > > > > > Because the lower range of UTF-8 matches the ascii character set > (intentionally by design), you'll be able to use UTF-8 for PHP files without > problem (i.e., ascii 7-bit chars have same encoding in UTF-8.) > http://www.cl.cam.ac.uk/~mgk25/unicode.html > > However, if you were to use any of the multibyte characters of UTF-8 in a > PHP file, you could run in to some trouble. I use UTF-8 for most of my PHP > files, but I've been sticking to the ASCII subset exclusively. > > Adam > I don't use the higher range of characters often, but I do sometimes use them for things like the graphical glyphs (½ââ, etc) I know I could do those with regular text and the Wingdings font, but that's not available on every computer, and breaks the semantic meaning behind the glyphs. Thanks, Ash http://www.ashleysheridan.co.uk
From: tedd on 27 May 2010 12:31 At 5:13 PM +0100 5/27/10, Ashley Sheridan wrote: > >I don't use the higher range of characters often, but I do sometimes use >them for things like the graphical glyphs (12)&, etc) I know I could do >those with regular text and the Wingdings font, but that's not available >on every computer, and breaks the semantic meaning behind the glyphs. > >Thanks, >Ash Ash: I read briefly on the css discuss list there is a movement to "force" download of fonts (i.e., char sets) to make layouts work. Apparently some browsers allow for that but I have not read up on it and I may have the wrong impression, but that was my take. For the exception of "evil" fonts, it seemed like a good idea. Cheers, tedd -- ------- http://sperling.com http://ancientstones.com http://earthstones.com
From: "Bob McConnell" on 27 May 2010 14:06 From: Ashley Sheridan >On Thu, 2010-05-27 at 12:08 -0400, Adam Richardson wrote: > >> On Thu, May 27, 2010 at 9:45 AM, Guus Ellenkamp >> <Ellenkamp_Guus(a)hotmail.com>wrote: >> >> > Thanks, but are you sure of that? I did some research a while ago and found >> > that officially PHP files should be ascii and not have any specific >> > character encoding. I believe it will work anyhow (did not try this one), >> > but would like to stick with the standards. >> > >> > "Ashley Sheridan" <ash(a)ashleysheridan.co.uk> wrote in message >> > news:1274883714.2202.228.camel(a)localhost... >> > > On Wed, 2010-05-26 at 22:20 +0800, Guus Ellenkamp wrote: >> > > >> > >> We use PHP defines for defining text in different languages. As far as I >> > >> know PHP files are supposed to be ASCII, not UTF-8 or something like >> > >> that. >> > >> What I want to make is a conversion program that would convert a given >> > >> UTF-8 >> > >> file with the format >> > >> >> > >> definetext1=this is a text in random UTF-8, probably arabic or similar >> > >> text >> > >> definetext2=this is another text in random UTF-8, probably arabic or >> > >> similar >> > >> text >> > >> >> > >> into a file with the following defines >> > >> >> > >> >> > define('definetext1',chr(<t_value>).chr(<h_value>).chr(<i_value>)... > <chr(<x_value>).chr(<t_value>)); >> > >> >> > define('definetext2,chr(<t_value>).chr(<h_value>).chr(<i_value>)... > <chr(<x_value>).chr(<t_value>)); >> > >> >> > >> Not sure if I'm using the correct chr/ord function, but I hope the above >> > >> is >> > >> clear enough to make clear what I'm looking for. Basically the output >> > >> file >> > >> should be ascii and not contain any utf-8. >> > >> >> > >> Any advise? The html_special_chars did not seem to work for Vietnamese >> > >> text >> > >> I tried to convert, so something seems to get wrong with just reading an >> > >> array of strings and converting the strings and putting them in defines. >> > > >> > > PHP files can contain utf-8, and in-fact is the preference of most >> > > developers I know of. >> > > >> > >> Because the lower range of UTF-8 matches the ascii character set >> (intentionally by design), you'll be able to use UTF-8 for PHP files without >> problem (i.e., ascii 7-bit chars have same encoding in UTF-8.) >> http://www.cl.cam.ac.uk/~mgk25/unicode.html >> >> However, if you were to use any of the multibyte characters of UTF-8 in a >> PHP file, you could run in to some trouble. I use UTF-8 for most of my PHP >> files, but I've been sticking to the ASCII subset exclusively. > > I don't use the higher range of characters often, but I do sometimes use > them for things like the graphical glyphs (½ââ, etc) I know I could do > those with regular text and the Wingdings font, but that's not available > on every computer, and breaks the semantic meaning behind the glyphs. What higher range? ASCII only defined 128 values, the bottom 32 being control characters that don't print. Anything outside of that is not ASCII, but a proprietary extension. In particular, the glyphs usually associated with 0-32 and 128-255 are IBM specific and not guaranteed to be present outside of their original video ROM. So only the first 128 characters map directly into UTF-8. Bob McConnell Ref: pp 25-29 The Programmer's PC Sourcebook, 1988, Thom Hogan, Microsoft Press
From: Ashley Sheridan on 27 May 2010 14:11 On Thu, 2010-05-27 at 14:06 -0400, Bob McConnell wrote: > From: Ashley Sheridan > > >On Thu, 2010-05-27 at 12:08 -0400, Adam Richardson wrote: > > > >> On Thu, May 27, 2010 at 9:45 AM, Guus Ellenkamp > >> <Ellenkamp_Guus(a)hotmail.com>wrote: > >> > >> > Thanks, but are you sure of that? I did some research a while ago and found > >> > that officially PHP files should be ascii and not have any specific > >> > character encoding. I believe it will work anyhow (did not try this one), > >> > but would like to stick with the standards. > >> > > >> > "Ashley Sheridan" <ash(a)ashleysheridan.co.uk> wrote in message > >> > news:1274883714.2202.228.camel(a)localhost... > >> > > On Wed, 2010-05-26 at 22:20 +0800, Guus Ellenkamp wrote: > >> > > > >> > >> We use PHP defines for defining text in different languages. As far as I > >> > >> know PHP files are supposed to be ASCII, not UTF-8 or something like > >> > >> that. > >> > >> What I want to make is a conversion program that would convert a given > >> > >> UTF-8 > >> > >> file with the format > >> > >> > >> > >> definetext1=this is a text in random UTF-8, probably arabic or similar > >> > >> text > >> > >> definetext2=this is another text in random UTF-8, probably arabic or > >> > >> similar > >> > >> text > >> > >> > >> > >> into a file with the following defines > >> > >> > >> > >> > >> > define('definetext1',chr(<t_value>).chr(<h_value>).chr(<i_value>)... > > <chr(<x_value>).chr(<t_value>)); > >> > >> > >> > define('definetext2,chr(<t_value>).chr(<h_value>).chr(<i_value>)... > > <chr(<x_value>).chr(<t_value>)); > >> > >> > >> > >> Not sure if I'm using the correct chr/ord function, but I hope the above > >> > >> is > >> > >> clear enough to make clear what I'm looking for. Basically the output > >> > >> file > >> > >> should be ascii and not contain any utf-8. > >> > >> > >> > >> Any advise? The html_special_chars did not seem to work for Vietnamese > >> > >> text > >> > >> I tried to convert, so something seems to get wrong with just reading an > >> > >> array of strings and converting the strings and putting them in defines. > >> > > > >> > > PHP files can contain utf-8, and in-fact is the preference of most > >> > > developers I know of. > >> > > > >> > > >> Because the lower range of UTF-8 matches the ascii character set > >> (intentionally by design), you'll be able to use UTF-8 for PHP files without > >> problem (i.e., ascii 7-bit chars have same encoding in UTF-8.) > >> http://www.cl.cam.ac.uk/~mgk25/unicode.html > >> > >> However, if you were to use any of the multibyte characters of UTF-8 in a > >> PHP file, you could run in to some trouble. I use UTF-8 for most of my PHP > >> files, but I've been sticking to the ASCII subset exclusively. > > > > I don't use the higher range of characters often, but I do sometimes use > > them for things like the graphical glyphs (½ââ, etc) I know I could do > > those with regular text and the Wingdings font, but that's not available > > on every computer, and breaks the semantic meaning behind the glyphs. > > What higher range? ASCII only defined 128 values, the bottom 32 being control characters that don't print. Anything outside of that is not ASCII, but a proprietary extension. In particular, the glyphs usually associated with 0-32 and 128-255 are IBM specific and not guaranteed to be present outside of their original video ROM. So only the first 128 characters map directly into UTF-8. > > Bob McConnell > > Ref: pp 25-29 The Programmer's PC Sourcebook, 1988, Thom Hogan, Microsoft Press The higher range of utf8 characters that don't map to ascii values. Thanks, Ash http://www.ashleysheridan.co.uk
From: tedd on 27 May 2010 15:13 At 7:11 PM +0100 5/27/10, Ashley Sheridan wrote: >On Thu, 2010-05-27 at 14:06 -0400, Bob McConnell wrote: > > From: Ashley Sheridan > > > I don't use the higher range of characters often, but I do sometimes use >> > them for things like the graphical glyphs (12)&, etc) I know I could do >> > those with regular text and the Wingdings font, but that's not available >> > on every computer, and breaks the semantic meaning behind the glyphs. >> >> What higher range? ASCII only defined 128 >>values, the bottom 32 being control characters >>that don't print. Anything outside of that is >>not ASCII, but a proprietary extension. In >>particular, the glyphs usually associated with >>0-32 and 128-255 are IBM specific and not >>guaranteed to be present outside of their >>original video ROM. So only the first 128 >>characters map directly into UTF-8. >> >> Bob McConnell >> >> Ref: pp 25-29 The Programmer's PC Sourcebook, >>1988, Thom Hogan, Microsoft Press > > >The higher range of utf8 characters that don't map to ascii values. > >Thanks, >Ash Bob: I understood what Ash was referring re his "higher range" statement, but his second statement was somewhat confusing. ASCII is defined as characters having a value of 0-127 DEC (00-7F HEX). The "higher range" of 128-255 DEC (80-FF HEX) have been loosely characterized as "extended ASCII" but have not been officially declared such. Both M$ and Apple have their own characters appearing the range and have used different character for different things -- thus problems arose is using either. I do not know if the problem was ever resolved. It's probably best to never use such characters. The Unicode database uses the same lower character values (i.e., "code points") as does ASCII, namely 0-127, and thus UFT-8 (8-bit variable width encoding) is really a super-set which includes the sub-set of ASCII. The "Wingdings" font that Ash refers to is the really the "Dingbat" char set in Unicode, as shown here: http://www.unicode.org/charts/PDF/U2700.pdf These are real characters that can be used for all sorts of things including url's, for example: http://xn--gci.com Please forgive the PUNYCODE url, but IE does not recognize "other than ASCII" characters in url's, whereas Safari will show the url correctly. Clearly, Safari has the upper hand in resolving "other than English" issues -- perhaps that's why their overseas profits last year exceeded their domestic -- but I digress. The use of UFT-8 encoding in everything (web and php) should present much less problems globally than it is trying to fight it. Here's some references that may help: [1] <http://webstandardsgroup.org/> [2] <http://www.w3.org/People/Ishida/> [3] <http://www.w3.org/International> [4] <http://shiflett.org/archive/177> [5] <http://en.wikipedia.org/wiki/Universal_character_set> [6] <http://www.unicode.org/> Cheers, tedd -- ------- http://sperling.com http://ancientstones.com http://earthstones.com
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 5 Prev: Problem in image placing Next: Google checkout nightmare |