From: Perry Smith on 23 Feb 2010 12:05 Title pretty much says it all. Here is a small sample program: #!/usr/bin/env ruby # -*- coding: utf-8 -*- s = "string" puts s.encoding r = Regexp.new(s) puts r.encoding Here is the output: UTF-8 US-ASCII I was expecting both to be set to UTF-8. There is no force_encoding method for RegExp. If I later try to use it on strings of type UTF-8, it can throw an exception. How is this suppose to be handled? Thanks, Perry -- Posted via http://www.ruby-forum.com/.
From: Roger Pack on 24 Feb 2010 12:56 > I was expecting both to be set to UTF-8. There is no force_encoding > method for RegExp. > > If I later try to use it on strings of type UTF-8, it can throw an > exception. Do you have an example of this? It might be a bug. I did notice that Regexp.new("Café").encoding keeps it in UTF-8 so maybe it's optimizing it and when it doesn't "have to be" UTF-8 it is leaving it as ASCII? -r -- Posted via http://www.ruby-forum.com/.
From: Perry Smith on 24 Feb 2010 13:24 Roger Pack wrote: > >> I was expecting both to be set to UTF-8. There is no force_encoding >> method for RegExp. >> >> If I later try to use it on strings of type UTF-8, it can throw an >> exception. > > Do you have an example of this? It might be a bug. > > I did notice that > > Regexp.new("Café").encoding > > keeps it in UTF-8 > > so maybe it's optimizing it and when it doesn't "have to be" UTF-8 it is > leaving it as ASCII? I'm not clear what you mean by an example other than what I put in the original note. I think I'm going to open a bug report -- it might not be a bug but I sure am confused. The "Pick Axe" book describes a third argument but I can't get that to work either. "ri" for Ruby 1.9.1 does not describe the third argument at all -- but it does seem to exist at least. It appears as if, as you pointed out, if the input string happens to be ASCII, then the regexp encoding is ascii and there doesn't seem to be anything you can do about it. I'm testing on 1.9.1 p243. But, due to another discussion thread, I think I want to be in 8 bit binary anyway in my case. I'm not 100% positive my input is UTF-8. Its suppose to be but I can't really trust it. Thanks Perry -- Posted via http://www.ruby-forum.com/.
From: Robert Gleeson on 24 Feb 2010 13:36 Typo fix: > Regexp.new(/foo/u).encoding # => UTF-8 -- Posted via http://www.ruby-forum.com/.
From: Roger Pack on 24 Feb 2010 13:58 >>> If I later try to use it on strings of type UTF-8, it can throw an >>> exception. > I'm not clear what you mean by an example other than what I put in the > original note. Do you have a small example (like your original) that throws an exception where you "use it on strings later of type UTF-8" and it throws an exception? -r -- Posted via http://www.ruby-forum.com/.
|
Next
|
Last
Pages: 1 2 3 4 Prev: Performance of Ruby 1.9 vs. Ruby 1.8 (was: Speed sprint) Next: Licensing for Ruby logo |