From: Josh Cheek on 19 Jun 2010 03:59 [Note: parts of this message were removed to make it a legal post.] On Sat, Jun 19, 2010 at 2:04 AM, Michael Fellinger <m.fellinger(a)gmail.com>wrote: > On Sat, Jun 19, 2010 at 6:21 AM, Josh Cheek <josh.cheek(a)gmail.com> wrote: > > > > Thanks, but it doesn't seem to work on 1.8 > > > > > > RUBY_VERSION # => "1.8.7" > > > > %w[Xeo xeo ball ABC abc].sort.each{|word| p word => word.codepoints.to_a > } # > > => > > # ~> -:3: undefined method `codepoints' for "ABC":String (NoMethodError) > > # ~> from -:3:in `each' > > # ~> from -:3 > > > > > > > > > > And the 1.8 ways to get it don't work on 1.9 (ie "a"[0]) > > >> %w[Xeo xeo ball ABC abc].sort.each{|word| p word => word.unpack('C*') } > {"ABC"=>[65, 66, 67]} > {"Xeo"=>[88, 101, 111]} > {"abc"=>[97, 98, 99]} > {"ball"=>[98, 97, 108, 108]} > {"xeo"=>[120, 101, 111]} > => ["ABC", "Xeo", "abc", "ball", "xeo"] > > There is always a way to make things work on both, it's just that I > don't care much about 1.8 anymore. > > -- > Michael Fellinger > CTO, The Rubyists, LLC > > Well, a lot of systems still ship with it, SnowLeopard, for example ships with 1.8.7, so I think that while this is a legitimate personal decision, it is good to be aware of one's audience. For example, since Abder-rahman is having difficulty understanding String comparison, then it is probably fair to assume he isn't initiated enough to understand why the example that is supposed to help him understand ends up breaking (if he is on 1.8). That could be very discouraging for someone new, come to the ML to get a better understanding, and the answers, given by the people who know what they are doing won't even run. Anyway, I really do like your solution ^_^ It is elegant and uniform, thank you for providing it.
From: Brian Candler on 21 Jun 2010 06:10 Josh Cheek wrote: > Well, this used to be easy to show, but apparently since ascii has been > abandoned, and I don't know unicode, I have to resort to hacky things > like > this to explain it. > > > $chars = (1..128).inject(Hash.new) { |chars,num| chars[num.chr] = num ; > chars } > > def to_number_array(str) > str.split(//).map { |char| $chars[char] } > end > > to_number_array 'Xeo' # => [88, 101, 111] > to_number_array 'xeo' # => [120, 101, 111] > to_number_array 'ball' # => [98, 97, 108, 108] > to_number_array 'ABC' # => [65, 66, 67] > to_number_array 'abc' # => [97, 98, 99] Except that this is irrelevant, because even ruby 1.9 does not compare strings by codepoints. It compares them byte-by-byte using memcmp. See rb_str_cmp_m() and rb_str_cmp() in string.c It's a designed-in side-effect of UTF-8 encoding that higher codepoints sort after lower ones. There is a table at http://en.wikipedia.org/wiki/UTF-8 under "Description" which illustrates this. However this does not work for other encodings. Try this for size: >> s1 = 97.chr("UTF-8") => "a" >> s2 = 257.chr("UTF-8") => "ā" >> s1 < s2 => true >> s1 = 97.chr("UTF-16LE") => "a\x00" >> s2 = 257.chr("UTF-16LE") => "\x01\x01" >> s1 < s2 => false Yes: that's the same two unicode codepoints, but sorting in different order. For encodings like UTF-16LE, where the least-significant byte comes before the most-significant byte, you get an almost arbitrary ordering. Proviso: I tested this with ruby 1.9.2dev (2009-07-18 trunk 24186) [i686-linux] ruby 1.9.x string encoding rules are (a) undocumented, and (b) subject to arbitrary changes between patchlevels, hence YMMV. -- Posted via http://www.ruby-forum.com/.
From: Brian Candler on 21 Jun 2010 06:27 Michael Fellinger wrote: >>> %w[Xeo xeo ball ABC abc].sort.each{|word| p word => word.unpack('C*') } > {"ABC"=>[65, 66, 67]} > {"Xeo"=>[88, 101, 111]} > {"abc"=>[97, 98, 99]} > {"ball"=>[98, 97, 108, 108]} > {"xeo"=>[120, 101, 111]} > => ["ABC", "Xeo", "abc", "ball", "xeo"] > > There is always a way to make things work on both, it's just that I > don't care much about 1.8 anymore. That does work the same on both, but it doesn't give codepoints. $ irb --simple-prompt >> "groß".unpack("C*") => [103, 114, 111, 195, 159] >> RUBY_VERSION => "1.8.6" $ irb19 --simple-prompt >> "groß".unpack('C*') => [103, 114, 111, 195, 159] >> "groß".codepoints.to_a => [103, 114, 111, 223] >> RUBY_DESCRIPTION => "ruby 1.9.2dev (2009-07-18 trunk 24186) [i686-linux]" -- Posted via http://www.ruby-forum.com/.
First
|
Prev
|
Pages: 1 2 3 Prev: Build 32 bit version of ruby 1.92 on snow leopard Next: 1.8.7 SMTP TLS How to? |