From: Oliver Peng on 3 Aug 2010 11:11 I found several issues in string encoding. Here is the problem: [root(a)mars mysql]# irb -E ascii # I start irb with default external encoding ascii irb(main):014:0> String.new.encoding => #<Encoding:ASCII-8BIT> irb(main):015:0> "".encoding => #<Encoding:US-ASCII> # I get different encodings when I initialize an empty string. Why? irb(main):023:0> "\x80".encoding => #<Encoding:ASCII-8BIT> irb(main):024:0> "\x7F".encoding => #<Encoding:US-ASCII> # It looks that if there is a ASCII value greater than 0x7F, it will use ASCII-8BIT encoding. That is OK. irb(main):005:0> new_str = "\xF1\xF2" => "\xF1\xF2" irb(main):006:0> new_str.encoding => #<Encoding:ASCII-8BIT> irb(main):007:0> s ="%c%c%c%c%c%s" % [49, 5, 245, 225, 1, new_str] Encoding::CompatibilityError: incompatible character encodings: US-ASCII and ASCII-8BIT from (irb):7:in `%' from (irb):7 from /bin/irb:12:in `<main>' # Now I try to use a ASCII-8BIT to format another string, it raises exception. Why? irb(main):008:0> s ="%c%c%c%c%c%s" % [49, 5, 45, 25, 1, new_str] => "1\x05-\x19\x01\xF1\xF2" # I am very surprise that if I don't use value > 0x7F to format, it can handle it. irb(main):012:0> s ="%c%c%c%c%c" % [49, 5, 245, 225, 1] => "1\x05\xF5\xE1\x01" irb(main):013:0> s.encoding => #<Encoding:US-ASCII> # If I don't put the ASCII-8BIT string to format, it also works. But I am very surprise that even there is a non-ASCII char inside the string, the encoding is US-ASCII. Why? -- Posted via http://www.ruby-forum.com/.
From: Oliver Peng on 3 Aug 2010 11:41 I figure out the first question. [root(a)mars mysql]# irb irb(main):001:0> s = String.new => "" irb(main):002:0> s.encoding => #<Encoding:ASCII-8BIT> irb(main):003:0> puts Encoding.default_external.name UTF-8 Ruby will always use ASCII-8BIT as encoding when you use String.new to create a new String object. -- Posted via http://www.ruby-forum.com/.
From: Brian Candler on 3 Aug 2010 12:02 Oliver Peng wrote: > Ruby will always use ASCII-8BIT as encoding when you use String.new to > create a new String object. Ugh. That's another special case to add to http://github.com/candlerb/string19/blob/master/string19.rb However in practice it doesn't matter much, because the empty string is compatible. irb(main):001:0> s1 = String.new => "" irb(main):002:0> s2 = "groß" => "groß" irb(main):003:0> s1.encoding => #<Encoding:ASCII-8BIT> irb(main):004:0> s2.encoding => #<Encoding:UTF-8> irb(main):005:0> s1 + s2 => "groß" And as for this which you found: irb(main):003:0> s = "%c%c%c%c%c".force_encoding("US-ASCII") => "%c%c%c%c%c" irb(main):004:0> t = s % [49, 5, 245, 225, 1] => "1\x05\xF5\xE1\x01" irb(main):005:0> t.encoding => #<Encoding:US-ASCII> I think it's just one of the many bugs in ruby 1.9.x, likely due to a total lack of specification of the new behaviour for all methods which accept or return strings (although if there's no specification, I suppose you can't really argue it's a bug; it can behave however it likes) -- Posted via http://www.ruby-forum.com/.
|
Pages: 1 Prev: Easy and fast job (SCRIPT) , 50 € Next: Different between 2 code |