From: Anna on 9 Jan 2010 06:03 I want to parse (recovered) corrupt xml files and automatically repair them for forensic purposes. (some elements are not properly closed or missing) I know the original xml scheme. (When i read the (corrupt) xml file a XmlException raises wich indecates the problem.) What's the best approach to solve this problem. I do appreciate any advice. Anna
From: Anna on 10 Jan 2010 07:49 > If the markup is not well-formed then I don't think any of the XML APIs in > the .NET framework help, they all want well-formed markup. I was afraid of that. So any advice on what's the best approach to solve this problem, writing my own code ? Anna
From: Anna on 12 Jan 2010 02:02 Thx, i'll give it a try. Anna "Martin Honnen" <mahotrash(a)yahoo.de> wrote in message news:%23dPB1sfkKHA.5604(a)TK2MSFTNGP04.phx.gbl... > Anna wrote: >>> If the markup is not well-formed then I don't think any of the XML APIs >>> in the .NET framework help, they all want well-formed markup. >> >> I was afraid of that. >> So any advice on what's the best approach to solve this problem, writing >> my own code ? > > You will need to find out exactly which rules the markup you have > implements respectively if there are any rules at all. The only other > markup language I know is SGML, it allows omitting certain tags, not > quoting certain attribute values, but there are clear rules how the parser > has to infer elements or has to find out where an attribute value ends. > There is a .NET implementation of an SGML parser, SgmlReader > (http://developer.mindtouch.com/SgmlReader) which can be used to convert > "HTML tag soup" to XHTML. There is also a HTML Tidy application doing the > same. So studying the code of such applications can help. > > > -- > > Martin Honnen --- MVP XML > http://msmvps.com/blogs/martin_honnen/
From: Richard.Williams.20 on 21 Jan 2010 15:01 I had done something like this in the past, but can't find the code. Here is what I did. I defined template in the form: m:company m:department m:employee o:salary This defines the hiearchy of XML. m: means mandatory, o: means optional element. I then parsed the input XML and built a stack of elements, doing the following as I parsed the file. - complete incomplete nodes - ensured that the elements are in the correct hiearchy - add missing (mandatory) elements with default values I remember there were some situations where the XML simply could not be repaired automatically. So this won't be the perfect solution, but it will be a start. I used biterscripting for easy parsing, stack- building, etc. Check on http://www.biterscripting.com/helppages_samplescripts.html if there any sample scripts you can reuse.
|
Pages: 1 Prev: Consuming web service receiving/returning objects Next: WSDL proxy issue! |