From: barriers Zhang on
"Fabrice " <fabrice.guy(a)gmail.com> wrote in message <fpudrc$o71$1(a)fred.mathworks.com>...
> Crypto Logic <crypto(a)online.de> wrote in message
> <fpslq1$r9n$1(a)online.de>...
> > Hi,
> >
> > I am trying to read and parse a large XML-file.
> >
> > The XML-file represents the data created by a custom
> laboratory software
> > that writes measurement data and some setup information
> into an ASCII-text
> > file.
> >
> > So I have tried reading the file - approx. 20MB in size -
> using readxml to
> > create the node and example 3 of the Matlab help for xmlread.
> >
> > Unfortunately, the file is so large that I got java heap
> errors at first and
> > had to increase memory area for java. Then what happened
> was that xmlread
> > could create the node but the example 3 function to parse
> the XML file ran
> > for about half an hour and then also quit displaying an error.
> >
> > 1.) What can I do to read and parse large XML-files?
> >
> > 2.) The example-3-help function for the readxml-command
> converts an XML file
> > into a Matlab structure. I have found that at least half
> of the structure
> > fields is useless because it creates empty fields. What
> can I do to improve
> > the example 3 function to successfully read in XML files?
> >
> > Thanks for any hint and kind regards,
> > Crypto.
>
>
> Hi,
>
> Instead of using xmlread function (which is recursive and
> not optimized for large file), you should try the javax.xml
> packages to parse the XML file and xpath functions to acces
> elements in your documents :
>
> To parse the file and put it in memory :
> import javax.xml.parsers.*;
> domFactory = DocumentBuilderFactory.newInstance();
> builder = domFactory.newDocumentBuilder();
> doc = builder.parse(xmlFileName);
>
> And then to access all elements named <element2> using xpath :
> import javax.xml.xpath.*;
> factory = XPathFactory.newInstance();
> xpath = factory.newXPath();
> expression=xpath.compile('element1/element2');
> result = expression.evaluate(doc, XPathConstants.NODESET);
> nbElement2 = result.getLength();
>
>
>
Another option you want to look into is vtd-xml, it is open source, and quite a bit more faster and memory efficient than SAX and DOM