From: Xavier Noria on 17 Apr 2010 11:35 Ruby 1.9 docs for String#ord say: Return the <code>Integer</code> ordinal of a one-character string. What does that mean? Check for example "Ã".ord # => 215 "Ã".bytes.to_a # => [195, 151] -- fxn
From: Xavier Noria on 17 Apr 2010 12:41 On Sat, Apr 17, 2010 at 5:35 PM, Xavier Noria <fxn(a)hashref.com> wrote: > Ruby 1.9 docs for String#ord say: > > Â Â Return the <code>Integer</code> ordinal of a one-character string. > > What does that mean? Check for example > > Â Â "Ã".ord # => 215 > Â Â "Ã".bytes.to_a # => [195, 151] Trial and error suggests it is the code of the character in the encoding of the string: euro = "\u20AC" euro.ord.to_s(16) # => "20ac" euro.encode("iso-8859-15").ord.to_s(16) # => "a4" That is what the source code suggests also: VALUE rb_str_ord(VALUE s) { unsigned int c; c = rb_enc_codepoint(RSTRING_PTR(s), RSTRING_END(s), STR_ENC_GET(s)); return UINT2NUM(c); }
From: Benoit Daloze on 17 Apr 2010 12:48 On 17 April 2010 18:41, Xavier Noria <fxn(a)hashref.com> wrote: > On Sat, Apr 17, 2010 at 5:35 PM, Xavier Noria <fxn(a)hashref.com> wrote: > > > Ruby 1.9 docs for String#ord say: > > > > Return the <code>Integer</code> ordinal of a one-character string. > > > > What does that mean? Check for example > > > > "×".ord # => 215 > > "×".bytes.to_a # => [195, 151] > > Trial and error suggests it is the code of the character in the > encoding of the string: > > euro = "\u20AC" > > euro.ord.to_s(16) # => "20ac" > euro.encode("iso-8859-15").ord.to_s(16) # => "a4" > > That is what the source code suggests also: > > VALUE > rb_str_ord(VALUE s) > { > unsigned int c; > > c = rb_enc_codepoint(RSTRING_PTR(s), RSTRING_END(s), STR_ENC_GET(s)); > return UINT2NUM(c); > } > > p "×".ord # => 215 p "×".bytes.to_a # => [195, 151] p "×".encoding # => #<Encoding:UTF-8> p "×".codepoints.to_a #=> [215] In UTF-8, (and Unicode in general), one byte is not always(or even never) a character. A codepoint represent a character ;) So, you can think of ord as codepoints[0], and that number of course depends of the String's Encoding. Regards, B.D.
From: Xavier Noria on 17 Apr 2010 13:47 Yes of course, a posteriori that's the only thing that makes sense. I was in a different context and the doc was not clear enough for me. Perhaps I send a patch to define #ord in terms of the code/codepoint in the string's character encoding, instead of that bare "ordinal".
|
Pages: 1 Prev: Trouble with Pushing Arrays to Arrays Next: Undifined local variable or method error |