From: Carfield Yim on
First I see exception message " is not legal for a JDOM character
content: 0x0 is not a legal XML character.", ok, then I trim all "\0"
character. Then, I get " is not legal for a JDOM character content:
0x1 is not a legal XML character." and " is not legal for a JDOM
character content: 0x2 is not a legal XML character.".

So.... how many illegal character for JDOM? Any easy way to parse all?
From: Mayeul on
Carfield Yim wrote:
> First I see exception message " is not legal for a JDOM character
> content: 0x0 is not a legal XML character.", ok, then I trim all "\0"
> character. Then, I get " is not legal for a JDOM character content:
> 0x1 is not a legal XML character." and " is not legal for a JDOM
> character content: 0x2 is not a legal XML character.".
>
> So.... how many illegal character for JDOM? Any easy way to parse all?

I am actually not sure, as I couldn't find any JDOM reference about it,
but I think it is safe to assume from the error messages, that any
illegal XML character is an illegal JDOM character.

U+0, U+1 and U+2 sure are illegal XML characters and it seems a good
idea for JDOM to reject them.

According to XML specifications:
(W3C server is overloaded again, check XML specification in Google, then
view the in-cache page)
http://209.85.229.132/search?q=cache:fdujgnyF_v4J:www.w3.org/TR/REC-xml/


The valid XML characters match this construction:

Character Range

Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
[#x10000-#x10FFFF]
/* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */


It's up to you to count whatever isn't in this construction.

> Any easy way to parse all?

Not sure. Excluding surrogate blocks while keeping non-BMP characters
should be tricky with a regexp.

To be honest, I'm kinda wondering what you are trying to build a DOM
from. It's not everyday that I have to filter out illegal characters and
am disallowed to just discard the input as invalid.

--
Mayeul
From: Carfield Yim on
..
>
> To be honest, I'm kinda wondering what you are trying to build a DOM
> from. It's not everyday that I have to filter out illegal characters and
> am disallowed to just discard the input as invalid.

I cannot control my source so exactly Iwould like to discard those
characters from the input source...
From: Mayeul on
Carfield Yim wrote:
> .
>> To be honest, I'm kinda wondering what you are trying to build a DOM
>> from. It's not everyday that I have to filter out illegal characters and
>> am disallowed to just discard the input as invalid.
>
> I cannot control my source so exactly Iwould like to discard those
> characters from the input source...

I wish you lucks, then.

Not sure it helps, but Verifier.isXMLCharacter(int) from JDOM will check
a character is a valid XML character (this same method is called to
raise the error you got.)

Note it takes an int, not a char, as parameter. This is because it
handles non-BMP characters. You might want to do that too.

--
Mayeul