From: Lew on
Arne Vajhøj wrote:
> On 24-03-2010 20:10, Lew wrote:
>> Arne Vajhøj wrote:
>>> But you still need a bunch of if statements [for SAX parsing].
>>
>> I've written a handful of SAX-parser based applications, starting with
>> my first paid Java gig eleven years ago. There really weren't many 'if'
>> statements in them; mostly I just instantiated an object based on the
>> tag being processed, using a Map to look up the appropriate handler. In
>> this it was similar to MVC code for servlets where you look up the
>> handler based on a request parameter.
>
> But in the case we are discussing, then the same tag appears in
> multiple contexts. That requires if statements.

Not really.

Each tag holds a reference to its enclosing tag, so it already "knows" where
belongs without need for 'if' statements.

>>> And the final code can easily become a bit messy.
>>
>> That's on the programmer, not the library.
>
> Using SAX to parse the type of XML documents we are talking
> about has to contain if statements and other solutions does
> not.

You are mistaken.

>>> I would prefer alternatives if they exists and are
>>> usable in the context.
>>
>> SAX is /non pareil/ for the areas where it shines. Back in 1999, using
>> Java 1.2 and then-current LAN tech (no gigabit or 100Mb/s LANs then) and
>> the relatively low-memory machines of the day we could process on the
>> order of a million hefty documents into or out of a database in about
>> four hours using SAX. We were limited pretty much by transfer speeds not
>> CPU because of the efficiency of SAX parsing.
>>
>> And there weren't a lot of 'if' statements involved, no more so than any
>> other app I've worked on.
>
> But given that we have a SAX_unfriendly structure of the XML
> document and no indication that it is a huge file, then SAX is
> not an obvious pick.


No one XML structure is more SAX-unfriendly than another.

> There are other cases where SAX do make sense. Even though StAX
> has overtaken quite a few of those.

--
Lew
From: bugbear on
Arne Vajh�j wrote:
> On 24-03-2010 13:09, bugbear wrote:
>> Arne Vajh�j wrote:
>>> On 23-03-2010 19:14, Robbo wrote:
>>>> I use SAXParserFactory to read data from XML files.
>>>>
>>>> Lets see some sample XML:
>>>>
>>>> <cyclogram>
>>>> <number>1</number>
>>>> <step>
>>>> <number>11</number>
>>>> </step>
>>>> </cyclogram>
>>>>
>>>> <cyclogram>
>>>> <number>1</number>
>>>> <step>
>>>> <number>11</number>
>>>> </step>
>>>> </cyclogram>
>>>>
>>>> Since "number" is both in "cyclogram" and "step" we
>>>> need to pursue if we are actually in "cyclogram" or in "step",
>>>> to decide if "number" is connected to "cyclogram" or to
>>>> "step".
>>>>
>>>> I wonder, if there are tools which could automatically
>>>> generate Java code for purpose of reading XML files.
>>>> For example, user of such tool could define structure
>>>> of XML file with use of some GUI (e.g. tree structure
>>>> graphicaly represented). After that user could press some
>>>> button and see Java code...
>>>> I hope you understand what I mean.
>>>
>>> Parsing that with SAX requires you to keep context.
>>
>> Yes - normally a simple tag stack is sufficient (push on start,
>> pop on end).
>
> But you still need a bunch of if statements.
>
> And the final code can easily become a bit messy.
>
> I would prefer alternatives if they exists and are
> usable in the context.

One of the neatest solutions I've seen is this:

http://www.devsphere.com/xml/saxdomix/

It uses a SAX parser, but (on a configurable trigger) it will
build DOM representations of sub-trees. This sub-tree
can then be handled with the DOM technique
that suits you.

This is MASSIVELY appropriate for the common case
where an XML file is actually a set (or list)
of repeated sub-elements, e.g. a catalogue
of books, list of customer orders, etc.

You get the convenience of DOM processing on the
(e.g.) book nodes without the normal DOM overhead
of having the whole XML file in RAM at once.

BugBear
From: Arne Vajhøj on
On 25-03-2010 05:42, bugbear wrote:
> Arne Vajh�j wrote:
>> On 24-03-2010 13:09, bugbear wrote:
>>> Arne Vajh�j wrote:
>>>> On 23-03-2010 19:14, Robbo wrote:
>>>>> I use SAXParserFactory to read data from XML files.
>>>>>
>>>>> Lets see some sample XML:
>>>>>
>>>>> <cyclogram>
>>>>> <number>1</number>
>>>>> <step>
>>>>> <number>11</number>
>>>>> </step>
>>>>> </cyclogram>
>>>>>
>>>>> <cyclogram>
>>>>> <number>1</number>
>>>>> <step>
>>>>> <number>11</number>
>>>>> </step>
>>>>> </cyclogram>
>>>>>
>>>>> Since "number" is both in "cyclogram" and "step" we
>>>>> need to pursue if we are actually in "cyclogram" or in "step",
>>>>> to decide if "number" is connected to "cyclogram" or to
>>>>> "step".
>>>>>
>>>>> I wonder, if there are tools which could automatically
>>>>> generate Java code for purpose of reading XML files.
>>>>> For example, user of such tool could define structure
>>>>> of XML file with use of some GUI (e.g. tree structure
>>>>> graphicaly represented). After that user could press some
>>>>> button and see Java code...
>>>>> I hope you understand what I mean.
>>>>
>>>> Parsing that with SAX requires you to keep context.
>>>
>>> Yes - normally a simple tag stack is sufficient (push on start,
>>> pop on end).
>>
>> But you still need a bunch of if statements.
>>
>> And the final code can easily become a bit messy.
>>
>> I would prefer alternatives if they exists and are
>> usable in the context.
>
> One of the neatest solutions I've seen is this:
>
> http://www.devsphere.com/xml/saxdomix/
>
> It uses a SAX parser, but (on a configurable trigger) it will
> build DOM representations of sub-trees. This sub-tree
> can then be handled with the DOM technique
> that suits you.
>
> This is MASSIVELY appropriate for the common case
> where an XML file is actually a set (or list)
> of repeated sub-elements, e.g. a catalogue
> of books, list of customer orders, etc.
>
> You get the convenience of DOM processing on the
> (e.g.) book nodes without the normal DOM overhead
> of having the whole XML file in RAM at once.

That sounds pretty cool !

Arne
From: Arne Vajhøj on
On 24-03-2010 23:47, Lew wrote:
> Arne Vajhøj wrote:
>> On 24-03-2010 20:10, Lew wrote:
>>> Arne Vajhøj wrote:
>>>> But you still need a bunch of if statements [for SAX parsing].
>>>
>>> I've written a handful of SAX-parser based applications, starting with
>>> my first paid Java gig eleven years ago. There really weren't many 'if'
>>> statements in them; mostly I just instantiated an object based on the
>>> tag being processed, using a Map to look up the appropriate handler. In
>>> this it was similar to MVC code for servlets where you look up the
>>> handler based on a request parameter.
>>
>> But in the case we are discussing, then the same tag appears in
>> multiple contexts. That requires if statements.
>
> Not really.
>
> Each tag holds a reference to its enclosing tag, so it already "knows"
> where belongs without need for 'if' statements.

It does ?

How do you in startElement get a ref to the enclosing tags (potentially
recursively) ?

>>>> And the final code can easily become a bit messy.
>>>
>>> That's on the programmer, not the library.
>>
>> Using SAX to parse the type of XML documents we are talking
>> about has to contain if statements and other solutions does
>> not.
>
> You are mistaken.

It would not be the first time.

>>>> I would prefer alternatives if they exists and are
>>>> usable in the context.
>>>
>>> SAX is /non pareil/ for the areas where it shines. Back in 1999, using
>>> Java 1.2 and then-current LAN tech (no gigabit or 100Mb/s LANs then) and
>>> the relatively low-memory machines of the day we could process on the
>>> order of a million hefty documents into or out of a database in about
>>> four hours using SAX. We were limited pretty much by transfer speeds not
>>> CPU because of the efficiency of SAX parsing.
>>>
>>> And there weren't a lot of 'if' statements involved, no more so than any
>>> other app I've worked on.
>>
>> But given that we have a SAX_unfriendly structure of the XML
>> document and no indication that it is a huge file, then SAX is
>> not an obvious pick.
>
> No one XML structure is more SAX-unfriendly than another.

Correct if you can get the enclosing element as you claim.

Arne
From: Lew on
Arne Vajhøj wrote:
>>>>> But you still need a bunch of if statements [for SAX parsing].

Lew wrote:
>>>> I've written a handful of SAX-parser based applications, starting with
>>>> my first paid Java gig eleven years ago. There really weren't many 'if'
>>>> statements in them; mostly I just instantiated an object based on the
>>>> tag being processed, using a Map to look up the appropriate handler. In
>>>> this it was similar to MVC code for servlets where you look up the
>>>> handler based on a request parameter.

Arne Vajhøj wrote:
>>> But in the case we are discussing, then the same tag appears in
>>> multiple contexts. That requires if statements.

Lew wrote:
>> Not really.
>>
>> Each tag holds a reference to its enclosing tag, so it already "knows"
>> where belongs without need for 'if' statements.

Arne Vajhøj wrote:
> It does ?
>
> How do you in startElement get a ref to the enclosing tags (potentially
> recursively) ?

It's been a long time since I've done one, so I don't have code samples handy.
I apologize; this would be so much easier if I did.

I derive from the parser class 'DefaultHandler' a custom implementation for
each tag (element). One member of that implementation is 'parent', which gets
set each time an enclosing element hands off to the handler for an enclosed
element. You can follow the 'parent' members in a chain right back to the
root element if you need to.

The 'endElement()' method returns control back to the enclosing handler.

This is not so very different from bugbear's suggestion that
> normally a simple tag stack is sufficient
> (push on start, pop on end).

It worked beautifully every time I've used it, including back in 1999 on that
first parsing project that used a pretty decent-sized DTD for each document
type and had quite ambitious performance goals, which we exceeded handily.

--
Lew