Prev: Can't run cgi script with Apache 2.2 with Windows XP
Next: Rubygems 0.9.5 and fastthread mswin32 gem
From: Greg Willits on 30 Nov 2007 15:18 This is mostly a Ruby thing, and partly a Rails thing. I'm expecting a validate_format_of with a regex like this /^[a-zA-Z\xC0-\xD6\xD9-\xF6\xF9-\xFF\.\'\-\ ]*?$/ to allow many of the normal characters like ö é å to be submitted via web form. However, the extended characters are being rejected. This works just fine though (which is just a-zA-Z) /^[\x41-\x5A\x61-\x7A\.\'\-\ ]*?$/ It also seems to fail with full \x0000 numbers, is there limit at \xFF? Some plain Ruby tests seem to suggest unicode characters don't work at all?? p 'abvHgtwHFuG'.scan(/[a-z]/) p 'abvHgtwHFuG'.scan(/[A-Z]/) p 'abvHgtwHFuG'.scan(/[\x41-\x5A]/) p 'abvHgtwHFuG'.scan(/[\x61-\x7A]/) p 'aébvHögtåwHÅFuG'.scan(/[\xC0-\xD6\xD9-\xF6\xF9-\xFF]/) ["a", "b", "v", "g", "t", "w", "u"] ["H", "H", "F", "G"] ["H", "H", "F", "G"] ["a", "b", "v", "g", "t", "w", "u"] ["\303", "\303", "\303", "\303"] So, what's the secret to using unicode character ranges in Ruby regex (or Rails validations)? -- def gw acts_as_n00b writes_at(www.railsdev.ws) end -- Posted via http://www.ruby-forum.com/.
From: Dale Martenson on 30 Nov 2007 16:05 On Nov 30, 2:18 pm, Greg Willits <li...(a)gregwillits.ws> wrote: > So, what's the secret to using unicode character ranges in Ruby regex > (or Rails validations)? Tim Bray gave a great talk about I18N, M17N and Unicode at the 2006 Ruby Conference. His presentation can be found at: http://www.tbray.org/talks/rubyconf2006.pdf He described how many member functions have trouble dealing with these character sets. He made special reference to regular expressions. --Dale
From: Greg Willits on 30 Nov 2007 17:00 Dale Martenson wrote: > On Nov 30, 2:18 pm, Greg Willits <li...(a)gregwillits.ws> wrote: > >> So, what's the secret to using unicode character ranges in Ruby regex >> (or Rails validations)? > > Tim Bray gave a great talk about I18N, M17N and Unicode at the 2006 > Ruby Conference. His presentation can be found at: > > http://www.tbray.org/talks/rubyconf2006.pdf > > He described how many member functions have trouble dealing with these > character sets. He made special reference to regular expressions. That's just beyond sad. I've been using Lasso for several years now, and *2003* it provided complete support for Unicode. I know there's some esoterics it may not deal with, but for all practical purposes we can round-trip data in western and eastern languages with Lasso quite easily. How can all these other languages be so far behind? Pretty bad if I can't even allow Mr. Muños or Göran to enter their names in a web form with proper server side validations. Aargh. -- gw -- Posted via http://www.ruby-forum.com/.
From: MonkeeSage on 1 Dec 2007 00:24 On Nov 30, 4:00 pm, Greg Willits <li...(a)gregwillits.ws> wrote: > Dale Martenson wrote: > > On Nov 30, 2:18 pm, Greg Willits <li...(a)gregwillits.ws> wrote: > > >> So, what's the secret to using unicode character ranges in Ruby regex > >> (or Rails validations)? > > > Tim Bray gave a great talk about I18N, M17N and Unicode at the 2006 > > Ruby Conference. His presentation can be found at: > > >http://www.tbray.org/talks/rubyconf2006.pdf > > > He described how many member functions have trouble dealing with these > > character sets. He made special reference to regular expressions. > > That's just beyond sad. > > I've been using Lasso for several years now, and *2003* it provided > complete support for Unicode. I know there's some esoterics it may not > deal with, but for all practical purposes we can round-trip data in > western and eastern languages with Lasso quite easily. > > How can all these other languages be so far behind? > > Pretty bad if I can't even allow Mr. Muños or Göran to enter their names > in a web form with proper server side validations. Aargh. > > -- gw > -- > Posted viahttp://www.ruby-forum.com/. Ruby 1.8 doesn't have unicode support (1.9 is starting to get it). Everything in ruby is a bytestring. irb(main):001:0> 'aébvHögtåwHÅFuG'.scan(/./) => ["a", "\303", "\251", "b", "v", "H", "\303", "\266", "g", "t", "\303", "\245", "w", "H", "\303", "\205", "F", "u", "G"] So your character class is matching the first byte of the composite characters (which is \303 in octal), and skipping the next (since it's below the range). You probably want something like... reg = /[\xc0-\xd6\xd9-\xf6\xf9-\xff][\x80-\xbc]/ 'aébvHögtåwHÅFuG'.scan(reg) irb(main):006:0* reg = /[\xc0-\xd6\xd9-\xf6\xf9-\xff][\x80-\xbc]/ => /[\xc0-\xd6\xd9-\xf6\xf9-\xff][\x80-\xbc]/ irb(main):007:0> 'aébvHögtåwHÅFuG'.scan(reg) => ["\303\251", "\303\266", "\303\245", "\303\205"] irb(main):008:0> "å" == "\303\245" => true Ps. I'm not entirely sure the value of the second character class is right. Regards, Jordan
From: Jimmy Kofler on 1 Dec 2007 05:16 > Unicode in Regex > Posted by Greg Willits (-gw-) on 30.11.2007 21:18 > This is mostly a Ruby thing, and partly a Rails thing. > > I'm expecting a validate_format_of with a regex like this > > /^[a-zA-Z\xC0-\xD6\xD9-\xF6\xF9-\xFF\.\'\-\ ]*?$/ > > to allow many of the normal characters like ö é å to be submitted via > web form. How about the utf8 validation regex here: http://snippets.dzone.com/posts/show/4527 ? -- Posted via http://www.ruby-forum.com/.
|
Next
|
Last
Pages: 1 2 3 4 5 6 7 Prev: Can't run cgi script with Apache 2.2 with Windows XP Next: Rubygems 0.9.5 and fastthread mswin32 gem |