Prev: Minimum form size with grid bag layout
Next: A pound of feathers, a pound of gold (Was: HashMap get/put)
From: Carfield Yim on 28 Oct 2009 11:46 First I see exception message " is not legal for a JDOM character content: 0x0 is not a legal XML character.", ok, then I trim all "\0" character. Then, I get " is not legal for a JDOM character content: 0x1 is not a legal XML character." and " is not legal for a JDOM character content: 0x2 is not a legal XML character.". So.... how many illegal character for JDOM? Any easy way to parse all?
From: Mayeul on 28 Oct 2009 12:34 Carfield Yim wrote: > First I see exception message " is not legal for a JDOM character > content: 0x0 is not a legal XML character.", ok, then I trim all "\0" > character. Then, I get " is not legal for a JDOM character content: > 0x1 is not a legal XML character." and " is not legal for a JDOM > character content: 0x2 is not a legal XML character.". > > So.... how many illegal character for JDOM? Any easy way to parse all? I am actually not sure, as I couldn't find any JDOM reference about it, but I think it is safe to assume from the error messages, that any illegal XML character is an illegal JDOM character. U+0, U+1 and U+2 sure are illegal XML characters and it seems a good idea for JDOM to reject them. According to XML specifications: (W3C server is overloaded again, check XML specification in Google, then view the in-cache page) http://209.85.229.132/search?q=cache:fdujgnyF_v4J:www.w3.org/TR/REC-xml/ The valid XML characters match this construction: Character Range Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */ It's up to you to count whatever isn't in this construction. > Any easy way to parse all? Not sure. Excluding surrogate blocks while keeping non-BMP characters should be tricky with a regexp. To be honest, I'm kinda wondering what you are trying to build a DOM from. It's not everyday that I have to filter out illegal characters and am disallowed to just discard the input as invalid. -- Mayeul
From: Carfield Yim on 28 Oct 2009 12:42 .. > > To be honest, I'm kinda wondering what you are trying to build a DOM > from. It's not everyday that I have to filter out illegal characters and > am disallowed to just discard the input as invalid. I cannot control my source so exactly Iwould like to discard those characters from the input source...
From: Mayeul on 28 Oct 2009 12:57
Carfield Yim wrote: > . >> To be honest, I'm kinda wondering what you are trying to build a DOM >> from. It's not everyday that I have to filter out illegal characters and >> am disallowed to just discard the input as invalid. > > I cannot control my source so exactly Iwould like to discard those > characters from the input source... I wish you lucks, then. Not sure it helps, but Verifier.isXMLCharacter(int) from JDOM will check a character is a valid XML character (this same method is called to raise the error you got.) Note it takes an int, not a char, as parameter. This is because it handles non-BMP characters. You might want to do that too. -- Mayeul |