From: Spud on 3 Sep 2009 11:53 I've got some XML which has apostrophes in the text: <text>O'Brien</text> Technically, it's supposed to encoded as ', but it's not. I don't have control over the input XML. So I parse the incoming XML using SAX, and bizarrely SAX converts the apostrophes to ' on its own at parse time. The SAX method: public void characters(char[] ch, int start, int length) throws SAXException {} gets called by the SAX parser with the characters "O'Brien". This seems backwards to me. The SAX parser should be *decoding* character entities, not encoding them. Question: how do I stop this behavior? I'm using the SAX implementation built into JDK 1.5. I can't seem to find any details on the implementation, including how to set options.
From: Donkey Hottie on 3 Sep 2009 16:31 "Spud" <fake(a)fkfkfkf.com> wrote in message news:ooadndMQ5dbjewLX4p2dnAA(a)giganews.com > Question: how do I stop this behavior? > > I'm using the SAX implementation built into JDK 1.5. I > can't seem to find any details on the implementation, > including how to set options. The implementation is Apache Xerces. Maybe their documentation helps, dunno.
From: Mike Schilling on 3 Sep 2009 17:19 Spud wrote: > I've got some XML which has apostrophes in the text: > > <text>O'Brien</text> > > Technically, it's supposed to encoded as ', but it's not. I don't > have control over the input XML. > That's optional. The only characters that must be encoded in text are < and &. > So I parse the incoming XML using SAX, and bizarrely SAX converts the > apostrophes to ' on its own at parse time. The SAX method: > > public void characters(char[] ch, int start, int length) throws > SAXException {} > > gets called by the SAX parser with the characters "O'Brien". > > This seems backwards to me. The SAX parser should be *decoding* > character entities, not encoding them. You're right, "bizarre" is the word for it. > > Question: how do I stop this behavior? > > I'm using the SAX implementation built into JDK 1.5. I can't seem to > find any details on the implementation, including how to set options. Can you put together an SSCCE?
From: Daniel Pitts on 3 Sep 2009 17:39 Spud wrote: > I've got some XML which has apostrophes in the text: > > <text>O'Brien</text> > > Technically, it's supposed to encoded as ', but it's not. I don't > have control over the input XML. > > So I parse the incoming XML using SAX, and bizarrely SAX converts the > apostrophes to ' on its own at parse time. The SAX method: > > public void characters(char[] ch, int start, int length) throws > SAXException {} > > gets called by the SAX parser with the characters "O'Brien". > > This seems backwards to me. The SAX parser should be *decoding* > character entities, not encoding them. > > Question: how do I stop this behavior? > > I'm using the SAX implementation built into JDK 1.5. I can't seem to > find any details on the implementation, including how to set options. Perhaps an SSCCE would help us help you. I would be surprised if what you say is true. -- Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
From: Pitch on 3 Sep 2009 19:13 In article <ooadndMQ5dbjewLX4p2dnAA(a)giganews.com>, fake(a)fkfkfkf.com says... > > I've got some XML which has apostrophes in the text: > > <text>O'Brien</text> > > Technically, it's supposed to encoded as ', but it's not. I don't > have control over the input XML. > > So I parse the incoming XML using SAX, and bizarrely SAX converts the > apostrophes to ' on its own at parse time. The SAX method: > > public void characters(char[] ch, int start, int length) throws > SAXException {} > > gets called by the SAX parser with the characters "O'Brien". > > This seems backwards to me. The SAX parser should be *decoding* > character entities, not encoding them. > > Question: how do I stop this behavior? > > I'm using the SAX implementation built into JDK 1.5. I can't seem to > find any details on the implementation, including how to set options. It is possible that you are parsing it the wrong way. I have encountered som situations that will do just that. For example (I'm pasting from the commented code): saxParser = saxParserFactory.newSAXParser(); parser = saxParser.getParser(); Instead of saxParserFactory = SAXParserFactory.newInstance(); saxParser = saxParserFactory.newSAXParser(); It has solved my problem with regional characters in input XML. Just fiddle areound, Java is full of odd things. NHF -- de gustibus disputandum esse
|
Next
|
Last
Pages: 1 2 Prev: Cryptography Next: Don't feed the troll (Was: Synchronization Question) |