From: dk on 21 Jan 2010 05:13 Hi All, While I'm trying to use some UTF-8 characters in my xml while parsing the xml using JDOM parser I'm getting this below exception: Malformed XML, Caused by: 'Invalid byte 2 of 4-byte UTF-8 sequence.' at com.clarify.boss.utility.xml.SimpleXmlParser.build (SimpleXmlParser.java:236) at com.clarify.boss.msf.handler.RespHeaderInitiateHandler.getStandardHeader (RespHeaderInitiateHandler.java:366) at com.clarify.boss.msf.handler.RespHeaderInitiateHandler.execute (RespHeaderInitiateHandler.java:289) at com.clarify.boss.utility.appcontroller.support.AbstractHandler.execute (AbstractHandler.java:42) at com.clarify.boss.utility.appcontroller.support.ApplicationControllerImpl.handleRequest (ApplicationControllerImpl.java:174) at com.clarify.boss.utility.appcontroller.support.ApplicationControllerImpl.execute (ApplicationControllerImpl.java:311) at com.clarify.boss.msf.support.ServiceFaultPublisherAB.executeImpl (ServiceFaultPublisherAB.java:87) at com.clarify.boss.common.base.BossActionBeanBase.execute (BossActionBeanBase.java:125) at com.clarify.boss.sa.msf.xbean.InvokeResponseXB.executeImpl (InvokeResponseXB.java:198) at com.clarify.cbo.XBeanImpl.baselineExecuteImpl_(XBeanImpl.java:275) at com.amdocs.oss.sm.core.common.XBeanBase.baselineExecuteImpl_ (XBeanBase.java:75) at com.clarify.cbo.XBeanImpl.execute(XBeanImpl.java:197) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:64) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:615) at com.clarify.sam.JavaDispatch.invokeMethodImp(JavaDispatch.java: 396) at com.clarify.sam.JavaDispatch.invokeMethod(JavaDispatch.java:348) at com.clarify.sam.ActionBeanService.invokeBeanMethod (ActionBeanService.java:509) at com.clarify.sam.ActionBeanService.invokeAifOperation (ActionBeanService.java:128) at com.clarify.sam.AppFrameworkBindingHandler.executeOperation (AppFrameworkBindingHandler.java:69) at com.amdocs.aif.consumer.ServiceContext.executeWithRetries (ServiceContext.java:900) at com.amdocs.aif.consumer.ServiceContext.executeOperationImpl (ServiceContext.java:756) at com.amdocs.aif.consumer.ServiceContext.executeOperation (ServiceContext.java:676) at com.amdocs.aif.consumer.ServiceContext.executeOperation (ServiceContext.java:323) at com.clarify.boss.errorhandler.resolver.ResolverLauncherSynchXB.executeImpl (ResolverLauncherSynchXB.java:157) ... 35 more Caused by: org.jdom.input.JDOMParseException: Error on line 72: Invalid byte 2 of 4-byte UTF-8 sequence. at org.jdom.input.SAXBuilder.build(SAXBuilder.java:468) at org.jdom.input.SAXBuilder.build(SAXBuilder.java:770) at com.clarify.boss.utility.xml.SimpleXmlParser.build (SimpleXmlParser.java:231) ... 60 more Caused by: org.xml.sax.SAXParseException: Invalid byte 2 of 4-byte UTF-8 sequence. at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException (Unknown Source) at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source) at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source) at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl $FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument (Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) at org.jdom.input.SAXBuilder.build(SAXBuilder.java:453) ... 62 more I have declared the encoding to be used while parsing, in my xml as UTF-8: <?xml version="1.0" encoding="UTF-8"?> Initially I doubted that the xml backup had some problem because on the same application server while I was trying to use the same xml as input it worked but from one of my friends machine it didn't. So is this could be the cause? But now I have even something more interesting out of all this. I tried changing the encoding to ISO-8859-1 i.e. : <?xml version="1.0" encoding="ISO-8859-1"?> & to surprise it worked. Now this has led to a confusion. I thought ISO-8859-1 is a charset which is subset of UTF-8. Then why didn't UTF-8 work whereas ISO-8859-1 worked? And lastly I can't change this encoding in my xml as in turn I would have to do all the regression once again on my application. So please let me know where I have gone wrong. The Java code that I'm using is: /* * (non-Javadoc) / * * @see com.clarify.boss.utility.xml.XmlParser#build (org.springframework.core.io.Resource) */ public Document build(Resource source) { try { return (getSystemId() == null ? getSaxBuilder().build (source.getInputStream()) : getSaxBuilder().build( source.getInputStream(), getSystemId())); } catch (Exception e) { e.printStackTrace(); BossErrorCode bossErrorCode = new BossErrorCode (ErrorCode.BOSS_MALFORMED_XML); throw new BossException(bossErrorCode, new String[] {e.getCause ().getMessage()},e); } } the sax builder method is: /** * Getter method for the <b>saxBuilder </b> property * * @return Returns the saxBuilder. */ private PropertyAwareSAXBuilder getSaxBuilder() { if (saxBuilder == null) { PropertyAwareSAXBuilder myParser = new PropertyAwareSAXBuilder( isValidate()); myParser.setFeature("http://apache.org/xml/features/validation/ schema", isValidate()); myParser.setFeature("http://xml.org/sax/features/namespaces", true); //CatalogResolver myResolver = new CatalogResolver(); CatalogResolver myResolver = getCatalogResolver(); myParser.setEntityResolver(myResolver); setSaxBuilder(myParser); Iterator it = getProperties().keySet().iterator(); while (it.hasNext()) { String name = (String) it.next(); saxBuilder.setProperty(name, getProperties().get(name)); } } return saxBuilder; } Regards, Dhirendra
From: bugbear on 21 Jan 2010 05:15 dk wrote: > Hi All, > > While I'm trying to use some UTF-8 characters in my xml while parsing > the xml using JDOM parser I'm getting this below exception: Have you checked that your data IS valid UTF-8 ? BugBear
From: Roedy Green on 21 Jan 2010 08:26 On Thu, 21 Jan 2010 02:13:27 -0800 (PST), dk <dhirendraism(a)gmail.com> wrote, quoted or indirectly quoted someone who said : > >While I'm trying to use some UTF-8 characters in my xml while parsing >the xml using JDOM parser I'm getting this below exception: Partition your problem. Is it that the file is malformed or is the problem getting the XML parser to understand the file is in UTF-8 encoding? You can examine your file in a hex viewer if you are familiar with UTF-8 encoding, or you could feed it to the Sun utility native2ascii to see if it likes it. See http://mindprod.com/jgloss/utf.html http://mindprod.com/jgloss/encoding.html You could also give up and use entities (NCRs). see http://mindprod.com/jgloss/xml.html#AWKWARD -- Roedy Green Canadian Mind Products http://mindprod.com Responsible Development is the style of development I aspire to now. It can be summarized by answering the question, �How would I develop if it were my money?� I�m amazed how many theoretical arguments evaporate when faced with this question. ~ Kent Beck (born: 1961 age: 49) , evangelist for extreme programming .
From: dk on 21 Jan 2010 10:03 On Jan 21, 6:26 pm, Roedy Green <see_webs...(a)mindprod.com.invalid> wrote: > On Thu, 21 Jan 2010 02:13:27 -0800 (PST), dk <dhirendra...(a)gmail.com> > wrote, quoted or indirectly quoted someone who said : > > > > >While I'm trying to use some UTF-8 characters in my xml while parsing > >the xml using JDOM parser I'm getting this below exception: > > Partition your problem. Is it that the file is malformed or is the > problem getting the XML parser to understand the file is in UTF-8 > encoding? > > You can examine your file in a hex viewer if you are familiar with > UTF-8 encoding, or you could feed it to the Sun utility native2ascii > to see if it likes it. > > Seehttp://mindprod.com/jgloss/utf.htmlhttp://mindprod.com/jgloss/encoding..html > > You could also give up and use entities (NCRs). > seehttp://mindprod.com/jgloss/xml.html#AWKWARD > -- > Roedy Green Canadian Mind Productshttp://mindprod.com > Responsible Development is the style of development I aspire to now. It can be summarized by answering the question, How would I develop if it were my money? I m amazed how many theoretical arguments evaporate when faced with this question. > ~ Kent Beck (born: 1961 age: 49) , evangelist for extreme programming . @BugBear: yeah the xml is a well formed and properly validated xml. @Roedy: write now I'm using ultraEdit and inserting the characters from the ASCII table that it has. I have even tried seeing it in hex mode and I got the same value from both the places. Meanwhile I have found something more interesting while reading the input stream from my xml if I exclusively define it to be formatted to UTF-8 in getByteStream it is working fine. Now here is this a Java bug (1.5.0.12)? or something else?
From: Mike Schilling on 21 Jan 2010 13:07
It may be a clue that 4-byte UTE-8 sequences only occur with surrogates, which there are two reasonable ways to encode: 1. Encode the code point as 4 bytes 2. Encode each 16-bit "char" as 3 bytes Only 1 is correct, but I'm sure there's lots of non-surrogate-aware code that does 2. |