From: Pen Ttt on 17 Apr 2010 01:22 in my computer(ubuntu9.1+ruby1.9): pt(a)pt-laptop:~$ irb irb(main):001:0> require 'iconv' => true irb(main):002:0> str = Iconv.iconv('GBK', 'UTF-8', '我说').to_s => "[\"��˵\"]" in my friend's(ubuntu9.1+ruby1.9): $ irb irb(main):001:0> require 'iconv' => true irb(main):002:0> str = Iconv.iconv('GBK', 'UTF-8', '我说').to_s => "\316\322\313\265" irb(main):003:0> puts Iconv.iconv('UTF-8', 'GBK', str).to_s 我说 => nil what's wrong in my system? -- Posted via http://www.ruby-forum.com/.
From: Brian Candler on 18 Apr 2010 05:22 Pen Ttt wrote: > in my computer(ubuntu9.1+ruby1.9): > pt(a)pt-laptop:~$ irb > irb(main):001:0> require 'iconv' > => true > irb(main):002:0> str = Iconv.iconv('GBK', 'UTF-8', '我说').to_s > => "[\"��˵\"]" > > in my friend's(ubuntu9.1+ruby1.9): > $ irb > irb(main):001:0> require 'iconv' > => true > irb(main):002:0> str = Iconv.iconv('GBK', 'UTF-8', '我说').to_s > => "\316\322\313\265" > irb(main):003:0> puts Iconv.iconv('UTF-8', 'GBK', str).to_s > 我说 > => nil > > what's wrong in my system? One of the joys of ruby 1.9 is that the same program run on two different machines can behave differently. That's even if the two machines have identical versions of ruby and OS *and* you are feeding in the same input data. My advice is to stick with ruby 1.8.x, where the behaviour is both sane and predictable. However there are other people who will vociferously tell you that I am doing the entire ruby community a disservice by recommending this to you. It's up to you whose advice to follow. If you want to persevere with ruby 1.9, I suggest the following: * Check you have exactly identical versions of 1.9 (check the RUBY_DESCRIPTION constant) on both machines. The behaviour is subtle, and a lot of it has changed. * Look at str.bytes.to_a to see if the byte sequence is correct or not. That is, the fact that irb displays the string wrongly or rightly doesn't mean anything; don't trust what you see. * Instead of using irb, write a .rb script, and run it from the command line directly. * Check the environments are the same on both. You could try experimenting with setting LANG and/or LC_ALL environment variables before starting ruby. * I tried to understand how this all works, and I documented what I found at http://github.com/candlerb/string19/blob/master/string19.rb There are about 200 cases of encoding behaviour described there. Also, it's possible to do what you're trying to do in ruby 1.9 without using Iconv, but instead tagging str with its correct encoding, and then using encode! to convert it to another. Whether it appears correctly on the terminal or not, especially within irb, is still not something to trust. Again, use str.bytes.to_a to see if it is the expected sequence of bytes in the new encoding. Good luck, Brian. -- Posted via http://www.ruby-forum.com/.
From: Benoit Daloze on 18 Apr 2010 10:18 [Note: parts of this message were removed to make it a legal post.] Hi, On 18 April 2010 11:22, Brian Candler <b.candler(a)pobox.com> wrote: > One of the joys of ruby 1.9 is that the same program run on two > different machines can behave differently. That's even if the two > machines have identical versions of ruby and OS *and* you are feeding in > the same input data. > Please don't be so pessimist without real reason :) (that said, show some code that has different result in the conditions you said). Maybe what you're describing is caused by different revisions, but that happened also in 1.8, no? * Look at str.bytes.to_a to see if the byte sequence is correct or not. > That is, the fact that irb displays the string wrongly or rightly > doesn't mean anything; don't trust what you see. > > Yes, that's true, encoding in irb is still ,often, having a bad result. B.D.
From: James Edward Gray II on 18 Apr 2010 10:31 On Apr 18, 2010, at 4:22 AM, Brian Candler wrote: > One of the joys of ruby 1.9 is that the same program run on two > different machines can behave differently. That's even if the two > machines have identical versions of ruby and OS *and* you are feeding in > the same input data. I'm pretty sure that's true with Ruby 1.8 as well. For example, don't the encodings available to iconv vary depending on the platform? James Edward Gray II
From: Brian Candler on 18 Apr 2010 13:06
Benoit Daloze wrote: > Please don't be so pessimist without real reason :) > (that said, show some code that has different result in the conditions > you > said). Sure. Here's a simple one: File.open("myfile.txt") do |f| line = f.gets line =~ /./ end You can run this script on two machines, with the same version of OS and ruby and the same myfile.txt but with different environment variable settings, and get it to crash on one but not the other. (One way: if the default external encoding on one machine is US-ASCII and myfile.txt contains any byte with the top bit set) > Maybe what you're describing is caused by different revisions, but that > happened also in 1.8, no? This is intentional behaviour in ruby 1.9. -- Posted via http://www.ruby-forum.com/. |