From: Mike Pe on 19 Jul 2010 15:13 Hi, So I am having issues parsing in a document using the Ruby XML parser REXML. The issue seems to be with the first line of my file that identifies the XML file. Here are two xml files, the first is not parsed with REXML and the second is parsed properly: error = <<EOF <?xml version="1.0" encoding="UTF-16"?> <document test="yes"> </document> EOF noerror = <<EOF <document test="yes"> </document> EOF When I try to parse in the information from "error", REXML does not read any of the attributes or elements. doc = Document.new error puts doc.root.attributes["test"] --> nil doc = Document.new noerror puts doc.root.attributes["test"] --> yes I considered the fact that REXML only takes in UTF-8 unicoded files, but when I convert these files from UTF-16 to UTF-8, it still does not parse properly. Does anyone know what I am doing wrong? Thank you very Much. Mike Attachments: http://www.ruby-forum.com/attachment/4868/error.xml -- Posted via http://www.ruby-forum.com/.
From: Robert Klemme on 19 Jul 2010 15:48 On 19.07.2010 21:13, Mike Pe wrote: > So I am having issues parsing in a document using the Ruby XML parser > REXML. The issue seems to be with the first line of my file that > identifies the XML file. > > Here are two xml files, the first is not parsed with REXML and the > second is parsed properly: > > error =<<EOF > <?xml version="1.0" encoding="UTF-16"?> > <document test="yes"> > </document> > EOF The string is most likely not UTF-16 encoded so REXML cannot parse it properly. Which Ruby version? If it is 1.9ish you'll find information about i18n here: http://blog.grayproductions.net/articles/understanding_m17n > noerror =<<EOF > <document test="yes"> > </document> > EOF > > When I try to parse in the information from "error", REXML does not read > any of the attributes or elements. > > > doc = Document.new error > puts doc.root.attributes["test"] --> nil > > doc = Document.new noerror > puts doc.root.attributes["test"] --> yes > > I considered the fact that REXML only takes in UTF-8 unicoded files, but > when I convert these files from UTF-16 to UTF-8, it still does not parse > properly. > > Does anyone know what I am doing wrong? Thank you very Much. Can you show what exactly you did? Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/
From: Mike Pe on 23 Jul 2010 18:12 Robert Klemme wrote: > On 19.07.2010 21:13, Mike Pe wrote: >> </document> >> EOF > > The string is most likely not UTF-16 encoded so REXML cannot parse it > properly. Which Ruby version? If it is 1.9ish you'll find information > about i18n here: > http://blog.grayproductions.net/articles/understanding_m17n > >> puts doc.root.attributes["test"] --> nil >> >> doc = Document.new noerror >> puts doc.root.attributes["test"] --> yes >> >> I considered the fact that REXML only takes in UTF-8 unicoded files, but >> when I convert these files from UTF-16 to UTF-8, it still does not parse >> properly. >> >> Does anyone know what I am doing wrong? Thank you very Much. > > Can you show what exactly you did? > > Kind regards > > robert Hi Robert, The issue is that the first line of my input file: <?xml version="1.0" encoding="UTF-16"?> Causes the file to be read as an "xml application". Basically, I just want to be able to use REXML to parse out this xml file, but it does not parse properly with this line in the beginning of my input file. (otherwise it works fine). I tried converting the files using iconv commands from your link, but it UTF-16 and UTF-8, the same error occurs, without regard for format. Why is this line interfering with the parser and how would I fix it? Thank you for your help. Best, Mike -- Posted via http://www.ruby-forum.com/.
From: Robert Klemme on 27 Jul 2010 05:01 2010/7/24 Mike Pe <mikep123(a)gmail.com>: > Robert Klemme wrote: >> On 19.07.2010 21:13, Mike Pe wrote: >>> </document> >>> EOF >> >> The string is most likely not UTF-16 encoded so REXML cannot parse it >> properly. Which Ruby version? If it is 1.9ish you'll find information >> about i18n here: >> http://blog.grayproductions.net/articles/understanding_m17n >> >>> puts doc.root.attributes["test"] --> nil >>> >>> doc = Document.new noerror >>> puts doc.root.attributes["test"] --> yes >>> >>> I considered the fact that REXML only takes in UTF-8 unicoded files, but >>> when I convert these files from UTF-16 to UTF-8, it still does not parse >>> properly. >>> >>> Does anyone know what I am doing wrong? Thank you very Much. >> >> Can you show what exactly you did? > The issue is that the first line of my input file: > > <?xml version="1.0" encoding="UTF-16"?> > > Causes the file to be read as an "xml application". Basically, I just > want to be able to use REXML to parse out this xml file, but it does not > parse properly with this line in the beginning of my input file. > (otherwise it works fine). Please provide the code you are using so others can try this out themselves. I asked for this already (see above). > I tried converting the files using iconv commands from your link, but it > UTF-16 and UTF-8, the same error occurs, without regard for format. > > Why is this line interfering with the parser and how would I fix it? > Thank you for your help. It seems there is no UTF-16 support: irb(main):009:0> f=File.open "x", "r:UTF-16" (irb):9: warning: Unsupported encoding UTF-16 ignored => #<File:x> So there is no point in trying to import a UTF-16 encoded file in Ruby. Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/
From: Mike Pe on 27 Jul 2010 12:47 Robert Klemme wrote: > 2010/7/24 Mike Pe <mikep123(a)gmail.com>: >>>> puts doc.root.attributes["test"] --> �nil >>> Can you show what exactly you did? >> The issue is that the first line of my input file: >> >> <?xml version="1.0" encoding="UTF-16"?> >> >> Causes the file to be read as an "xml application". Basically, I just >> want to be able to use REXML to parse out this xml file, but it does not >> parse properly with this line in the beginning of my input file. >> (otherwise it works fine). > > Please provide the code you are using so others can try this out > themselves. I asked for this already (see above). > >> I tried converting the files using iconv commands from your link, but it >> UTF-16 and UTF-8, the same error occurs, without regard for format. >> >> Why is this line interfering with the parser and how would I fix it? >> Thank you for your help. > > It seems there is no UTF-16 support: > > irb(main):009:0> f=File.open "x", "r:UTF-16" > (irb):9: warning: Unsupported encoding UTF-16 ignored > => #<File:x> > > So there is no point in trying to import a UTF-16 encoded file in Ruby. > > Kind regards > > robert Hi Robert, As for the code that I am using, I simplified the code in my original post. The first line: doc = REXML::Document.new error Should parse in the XML document and recognize all of the roots, elements, attributes, etc. from the input document. i.e.: puts doc.root.attributes["test"] Should return "yes" because the attribute in the error xml file (see above) is "yes. With the extra line, it puts "nil". (because the parser did not do its job). I tried converting all of the files to UTF-8 and they still did not work. (If you remove the extra line, it does work) I do not think the problem with is in the unicode. Thanks, Mike -- Posted via http://www.ruby-forum.com/.
|
Next
|
Last
Pages: 1 2 Prev: HOW CAN I HACK $5000 FROM PAYPAL WATCH VIDEO. Next: [ANN] launchy 0.3.7 Released |