large XML files [Java Programming]

Prev: object relational database versus "inteligent" serialization
Next: Portable Key Derivation from a password

From: Tom Anderson on 8 Feb 2010 15:28

On Mon, 8 Feb 2010, Lew wrote:

> Lew wrote:
>
>>> I know from a recent project that it's next to useless to match XPath
>>> expressions with a SAX parser.
>
> Tom Anderson wrote:
>> In what sense? That it justs builds a DOM tree behind the scenes?
>
> In the sense that for XPath to work, there has to already be a DOM for
> it to search, or else you have to forego built-in XPath processing.

Right, yes.

> In that recent project they attempted to cache results from XPath
> expressions that were built by manually matching the expression with
> data from the streamed input. When that missed, they had to either
> re-read the whole input or go ahead and build a DOM regardless. The
> complexity and time cost of manual XPath handling and the frequency of
> misses presented a rather intractable barrier to the approach.

Yes, unless you know what a large fraction of your XPaths are upfront, i
can't see that being a winning strategy.

> That's only a single data point, of course. I don't rule out the
> possibility that another approach to blending SAX and XPath could work.
> Had it been up to me, I would have abandoned XPath for that application
> and just used SAX or StAX to build a domain-specific object model, not a
> DOM, and directly referenced items from that model.

Sounds sensible. Every time i've had to deal with XML and had the freedom
to do it how i liked, i've ended up doing just that - write a
ContentHandler that turns the elements into calls to some domain-space
interface, then write an implementation of that that either builds objects
or does something else useful.

tom

--
24-Hour Monkey-Vision!

From: Mike Schilling on 8 Feb 2010 15:44

Lew wrote:
> Tom Anderson wrote:
>>> Weeeellll, kinda. Some XSLTs will require the whole document to be
>>> held in memory. But it is possible to process some XSLTs in a
>>> streaming or streaming-ish manner (where elements are held in
>>> memory, but only a subset at a time). There's nothing stopping an
>>> XSLT processor compiling such XSLTs into a form which does just
>>> that. Whether any actually do, i don't know.
>
> None in common use. The usual XSLT and XPath processors assume a DOM.

Not exactly. Xalan (the default XPath and XSLT processor found in the JRE)
builds a DTM (Document Table Model), which represents the document as a set
of arrays, mostly arrays of integers but a few arrays of String as well.

From: Arne Vajhøj on 8 Feb 2010 20:13

On 08-02-2010 06:57, Roedy Green wrote:
> On Sun, 07 Feb 2010 13:14:26 -0500, "John B. Matthews"
> <nospam(a)nospam.invalid> wrote, quoted or indirectly quoted someone who
> said :
>> I thought that was a principal advantage of the Simple API For XML (SAX)
>> model, at least in principle. :-)
>
> I read a sentence about SAX that lead me to believe it too read the
> whole file into RAM, it just did not create a DOM tree. I am glad that
> is not true.

Any link to that "sentence" ?

There really would not be that much point in SAX if it
did read everything into memory.

Arne

First | Prev |
Pages: 1 2 3 4 5
Prev: object relational database versus "inteligent" serialization
Next: Portable Key Derivation from a password