From: Spud on
I've got some XML which has apostrophes in the text:

<text>O'Brien</text>

Technically, it's supposed to encoded as &apos;, but it's not. I don't
have control over the input XML.

So I parse the incoming XML using SAX, and bizarrely SAX converts the
apostrophes to &apos; on its own at parse time. The SAX method:

public void characters(char[] ch, int start, int length) throws
SAXException {}

gets called by the SAX parser with the characters "O&apos;Brien".

This seems backwards to me. The SAX parser should be *decoding*
character entities, not encoding them.

Question: how do I stop this behavior?

I'm using the SAX implementation built into JDK 1.5. I can't seem to
find any details on the implementation, including how to set options.
From: Donkey Hottie on
"Spud" <fake(a)fkfkfkf.com> wrote in message
news:ooadndMQ5dbjewLX4p2dnAA(a)giganews.com
> Question: how do I stop this behavior?
>
> I'm using the SAX implementation built into JDK 1.5. I
> can't seem to find any details on the implementation,
> including how to set options.

The implementation is Apache Xerces. Maybe their documentation helps, dunno.

From: Mike Schilling on
Spud wrote:
> I've got some XML which has apostrophes in the text:
>
> <text>O'Brien</text>
>
> Technically, it's supposed to encoded as &apos;, but it's not. I don't
> have control over the input XML.
>

That's optional. The only characters that must be encoded in text are < and
&.

> So I parse the incoming XML using SAX, and bizarrely SAX converts the
> apostrophes to &apos; on its own at parse time. The SAX method:
>
> public void characters(char[] ch, int start, int length) throws
> SAXException {}
>
> gets called by the SAX parser with the characters "O&apos;Brien".
>
> This seems backwards to me. The SAX parser should be *decoding*
> character entities, not encoding them.

You're right, "bizarre" is the word for it.

>
> Question: how do I stop this behavior?
>
> I'm using the SAX implementation built into JDK 1.5. I can't seem to
> find any details on the implementation, including how to set options.

Can you put together an SSCCE?


From: Daniel Pitts on
Spud wrote:
> I've got some XML which has apostrophes in the text:
>
> <text>O'Brien</text>
>
> Technically, it's supposed to encoded as &apos;, but it's not. I don't
> have control over the input XML.
>
> So I parse the incoming XML using SAX, and bizarrely SAX converts the
> apostrophes to &apos; on its own at parse time. The SAX method:
>
> public void characters(char[] ch, int start, int length) throws
> SAXException {}
>
> gets called by the SAX parser with the characters "O&apos;Brien".
>
> This seems backwards to me. The SAX parser should be *decoding*
> character entities, not encoding them.
>
> Question: how do I stop this behavior?
>
> I'm using the SAX implementation built into JDK 1.5. I can't seem to
> find any details on the implementation, including how to set options.
Perhaps an SSCCE would help us help you. I would be surprised if what
you say is true.

--
Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
From: Pitch on
In article <ooadndMQ5dbjewLX4p2dnAA(a)giganews.com>, fake(a)fkfkfkf.com
says...
>
> I've got some XML which has apostrophes in the text:
>
> <text>O'Brien</text>
>
> Technically, it's supposed to encoded as &apos;, but it's not. I don't
> have control over the input XML.
>
> So I parse the incoming XML using SAX, and bizarrely SAX converts the
> apostrophes to &apos; on its own at parse time. The SAX method:
>
> public void characters(char[] ch, int start, int length) throws
> SAXException {}
>
> gets called by the SAX parser with the characters "O&apos;Brien".
>
> This seems backwards to me. The SAX parser should be *decoding*
> character entities, not encoding them.
>
> Question: how do I stop this behavior?
>
> I'm using the SAX implementation built into JDK 1.5. I can't seem to
> find any details on the implementation, including how to set options.


It is possible that you are parsing it the wrong way. I have encountered
som situations that will do just that. For example (I'm pasting from the
commented code):

saxParser = saxParserFactory.newSAXParser();
parser = saxParser.getParser();

Instead of

saxParserFactory = SAXParserFactory.newInstance();
saxParser = saxParserFactory.newSAXParser();


It has solved my problem with regional characters in input XML.
Just fiddle areound, Java is full of odd things. NHF


--
de gustibus disputandum esse