iconv transfer code [Ruby]

Prev: [ANN] Ruby-GNOME2 0.19.4
Next: rails errorm on FreeBSD-RELEASE

From: Pen Ttt on 17 Apr 2010 01:22

in my computer(ubuntu9.1+ruby1.9):
pt(a)pt-laptop:~$ irb
irb(main):001:0> require 'iconv'
=> true
irb(main):002:0> str = Iconv.iconv('GBK', 'UTF-8', '我说').to_s
=> "[\"��˵\"]"

in my friend's(ubuntu9.1+ruby1.9):
$ irb
irb(main):001:0> require 'iconv'
=> true
irb(main):002:0> str = Iconv.iconv('GBK', 'UTF-8', '我说').to_s
=> "\316\322\313\265"
irb(main):003:0> puts Iconv.iconv('UTF-8', 'GBK', str).to_s
我说
=> nil

what's wrong in my system?
--
Posted via http://www.ruby-forum.com/.

From: Brian Candler on 18 Apr 2010 05:22

Pen Ttt wrote:
> in my computer(ubuntu9.1+ruby1.9):
> pt(a)pt-laptop:~$ irb
> irb(main):001:0> require 'iconv'
> => true
> irb(main):002:0> str = Iconv.iconv('GBK', 'UTF-8', '我说').to_s
> => "[\"��˵\"]"
>
> in my friend's(ubuntu9.1+ruby1.9):
> $ irb
> irb(main):001:0> require 'iconv'
> => true
> irb(main):002:0> str = Iconv.iconv('GBK', 'UTF-8', '我说').to_s
> => "\316\322\313\265"
> irb(main):003:0> puts Iconv.iconv('UTF-8', 'GBK', str).to_s
> 我说
> => nil
>
> what's wrong in my system?

One of the joys of ruby 1.9 is that the same program run on two
different machines can behave differently. That's even if the two
machines have identical versions of ruby and OS *and* you are feeding in
the same input data.

My advice is to stick with ruby 1.8.x, where the behaviour is both sane
and predictable. However there are other people who will vociferously
tell you that I am doing the entire ruby community a disservice by
recommending this to you. It's up to you whose advice to follow.

If you want to persevere with ruby 1.9, I suggest the following:

* Check you have exactly identical versions of 1.9 (check the
RUBY_DESCRIPTION constant) on both machines. The behaviour is subtle,
and a lot of it has changed.

* Look at str.bytes.to_a to see if the byte sequence is correct or not.
That is, the fact that irb displays the string wrongly or rightly
doesn't mean anything; don't trust what you see.

* Instead of using irb, write a .rb script, and run it from the command
line directly.

* Check the environments are the same on both. You could try
experimenting with setting LANG and/or LC_ALL environment variables
before starting ruby.

* I tried to understand how this all works, and I documented what I
found at http://github.com/candlerb/string19/blob/master/string19.rb

There are about 200 cases of encoding behaviour described there.

Also, it's possible to do what you're trying to do in ruby 1.9 without
using Iconv, but instead tagging str with its correct encoding, and then
using encode! to convert it to another. Whether it appears correctly on
the terminal or not, especially within irb, is still not something to
trust. Again, use str.bytes.to_a to see if it is the expected sequence
of bytes in the new encoding.

Good luck,

Brian.
--
Posted via http://www.ruby-forum.com/.

From: Benoit Daloze on 18 Apr 2010 10:18

[Note: parts of this message were removed to make it a legal post.]

Hi,
On 18 April 2010 11:22, Brian Candler <b.candler(a)pobox.com> wrote:

> One of the joys of ruby 1.9 is that the same program run on two
> different machines can behave differently. That's even if the two
> machines have identical versions of ruby and OS *and* you are feeding in
> the same input data.
>

Please don't be so pessimist without real reason :)
(that said, show some code that has different result in the conditions you
said).

Maybe what you're describing is caused by different revisions, but that
happened also in 1.8, no?

* Look at str.bytes.to_a to see if the byte sequence is correct or not.
> That is, the fact that irb displays the string wrongly or rightly
> doesn't mean anything; don't trust what you see.
>
> Yes, that's true, encoding in irb is still ,often, having a bad result.

B.D.

From: James Edward Gray II on 18 Apr 2010 10:31

On Apr 18, 2010, at 4:22 AM, Brian Candler wrote:

> One of the joys of ruby 1.9 is that the same program run on two
> different machines can behave differently. That's even if the two
> machines have identical versions of ruby and OS *and* you are feeding in
> the same input data.

I'm pretty sure that's true with Ruby 1.8 as well. For example, don't the encodings available to iconv vary depending on the platform?

James Edward Gray II

From: Brian Candler on 18 Apr 2010 13:06

Benoit Daloze wrote:
> Please don't be so pessimist without real reason :)
> (that said, show some code that has different result in the conditions
> you
> said).

Sure. Here's a simple one:

File.open("myfile.txt") do |f|
line = f.gets
line =~ /./
end

You can run this script on two machines, with the same version of OS and
ruby and the same myfile.txt but with different environment variable
settings, and get it to crash on one but not the other. (One way: if the
default external encoding on one machine is US-ASCII and myfile.txt
contains any byte with the top bit set)

> Maybe what you're describing is caused by different revisions, but that
> happened also in 1.8, no?

This is intentional behaviour in ruby 1.9.
--
Posted via http://www.ruby-forum.com/.

| Next | Last
Pages: 1 2 3
Prev: [ANN] Ruby-GNOME2 0.19.4
Next: rails errorm on FreeBSD-RELEASE