what is String#ord? [Ruby]

Prev: Trouble with Pushing Arrays to Arrays
Next: Undifined local variable or method error

From: Xavier Noria on 17 Apr 2010 11:35

Ruby 1.9 docs for String#ord say:

Return the <code>Integer</code> ordinal of a one-character string.

What does that mean? Check for example

"Ã".ord # => 215
"Ã".bytes.to_a # => [195, 151]

-- fxn

From: Xavier Noria on 17 Apr 2010 12:41

On Sat, Apr 17, 2010 at 5:35 PM, Xavier Noria <fxn(a)hashref.com> wrote:

> Ruby 1.9 docs for String#ord say:
>
> Â Â Return the <code>Integer</code> ordinal of a one-character string.
>
> What does that mean? Check for example
>
> Â Â "Ã".ord # => 215
> Â Â "Ã".bytes.to_a # => [195, 151]

Trial and error suggests it is the code of the character in the
encoding of the string:

euro = "\u20AC"

euro.ord.to_s(16) # => "20ac"
euro.encode("iso-8859-15").ord.to_s(16) # => "a4"

That is what the source code suggests also:

VALUE
rb_str_ord(VALUE s)
{
unsigned int c;

c = rb_enc_codepoint(RSTRING_PTR(s), RSTRING_END(s), STR_ENC_GET(s));
return UINT2NUM(c);
}

From: Benoit Daloze on 17 Apr 2010 12:48

On 17 April 2010 18:41, Xavier Noria <fxn(a)hashref.com> wrote:

> On Sat, Apr 17, 2010 at 5:35 PM, Xavier Noria <fxn(a)hashref.com> wrote:
>
> > Ruby 1.9 docs for String#ord say:
> >
> > Return the <code>Integer</code> ordinal of a one-character string.
> >
> > What does that mean? Check for example
> >
> > "×".ord # => 215
> > "×".bytes.to_a # => [195, 151]
>
> Trial and error suggests it is the code of the character in the
> encoding of the string:
>
> euro = "\u20AC"
>
> euro.ord.to_s(16) # => "20ac"
> euro.encode("iso-8859-15").ord.to_s(16) # => "a4"
>
> That is what the source code suggests also:
>
> VALUE
> rb_str_ord(VALUE s)
> {
> unsigned int c;
>
> c = rb_enc_codepoint(RSTRING_PTR(s), RSTRING_END(s), STR_ENC_GET(s));
> return UINT2NUM(c);
> }
>
>
p "×".ord # => 215
p "×".bytes.to_a # => [195, 151]
p "×".encoding # => #<Encoding:UTF-8>
p "×".codepoints.to_a #=> [215]

In UTF-8, (and Unicode in general), one byte is not always(or even never) a
character.
A codepoint represent a character ;)

So, you can think of ord as codepoints[0], and that number of course depends
of the String's Encoding.

Regards,
B.D.

From: Xavier Noria on 17 Apr 2010 13:47

Yes of course, a posteriori that's the only thing that makes sense. I
was in a different context and the doc was not clear enough for me.

Perhaps I send a patch to define #ord in terms of the code/codepoint
in the string's character encoding, instead of that bare "ordinal".

|
Pages: 1
Prev: Trouble with Pushing Arrays to Arrays
Next: Undifined local variable or method error