From: David Springer on 24 Feb 2010 14:35 Perry Smith wrote: > r = Regexp.new(s) Try this: r = Regexp.new(s,16) -David -- Posted via http://www.ruby-forum.com/.
From: Perry Smith on 24 Feb 2010 16:02 Roger Pack wrote: >>>> If I later try to use it on strings of type UTF-8, it can throw an >>>> exception. >> I'm not clear what you mean by an example other than what I put in the >> original note. > > > Do you have a small example (like your original) that throws an > exception where you "use it on strings later of type UTF-8" and it > throws an exception? No I don't. I *think* that I might have had a string that was not utf-8. I was fetching strings from a file and just doing a force_encoding because they were suppose to be utf-8 but maybe they were not. I'm not sure. Let me see if I can make an example. My trivial examples so far don't throw an exception. -- Posted via http://www.ruby-forum.com/.
From: Brian Candler on 24 Feb 2010 16:30 Perry Smith wrote: > I think I'm going to open a bug report -- it might not be a bug but I > sure am confused. It's not a bug(*), and it sure is confusing. My own attempt to document Ruby 1.9's encoding rules, which is woefully incomplete but covers about 200 different cases, is at http://github.com/candlerb/string19/blob/master/string19.rb What you've observed is described in section 3.3. Basically, a Regexp which contains only ASCII characters is given an encoding of US-ASCII regardless of the original string's encoding (this is different to Strings, which might have an encoding of say UTF-8 but have the ascii_only? property true if they contain only ASCII characters). However there is a hidden "fixed_encoding" property you can set on a Regexp: >> r1 = Regexp.new("string") => /string/ >> r2 = Regexp.new("string", Regexp::FIXEDENCODING) => /string/ >> r1.encoding => #<Encoding:US-ASCII> >> r2.encoding => #<Encoding:UTF-8> >> r1.fixed_encoding? => false >> r2.fixed_encoding? => true I say it's a "hidden" property because the flag isn't revealed if you use inspect or to_s (unlike the //m, //i and //x properties) >> r1.to_s => "(?-mix:string)" >> r2.to_s => "(?-mix:string)" HTH, Brian. (*) Except in as much as the entire Encoding nonsense in ruby 1.9 is one enormous bug -- Posted via http://www.ruby-forum.com/.
From: David Springer on 24 Feb 2010 16:43 Perry, In 1.9 there is only one optional parameter. You can force the encoding of the string parameter (if needed) AND also pass the options parameter. Try this: #!/usr/bin/env ruby s = "string" puts s.encoding r = Regexp.new(s.encode("utf-8"), Regexp::ENC_UTF8) puts r.encoding Here is the output: US-ASCII UTF-8 -David -- Posted via http://www.ruby-forum.com/.
From: Perry Smith on 24 Feb 2010 17:34 Hi Brian and David, Thanks. I'm doing more experimenting and I'm also looking at the source code. I need to drag down the latest. I'm looking at 1.9.1 p243 right now. Regexp.new has a third optional argument -- it is sorta described in the Pick Axe book but the code looks wrong. It can be either 'n' or 'xN' where x can be anything. Perhaps that is gone in the latest code. But the "fixed encoding" is a key part of the puzzle I was missing. Also, David, I had not bumped into the ENC_UTF8 constant yet. There are quite a few constants (like the 16 pointed out by David also) is a flag to make the encoding "fixed". The latest code that David posted answers exactly what my original question was. Thanks! -- Posted via http://www.ruby-forum.com/.
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 Prev: Performance of Ruby 1.9 vs. Ruby 1.8 (was: Speed sprint) Next: Licensing for Ruby logo |