From: Arne Vajhøj on
On 24-03-2010 13:09, bugbear wrote:
> Arne Vajh�j wrote:
>> On 23-03-2010 19:14, Robbo wrote:
>>> I use SAXParserFactory to read data from XML files.
>>>
>>> Lets see some sample XML:
>>>
>>> <cyclogram>
>>> <number>1</number>
>>> <step>
>>> <number>11</number>
>>> </step>
>>> </cyclogram>
>>>
>>> <cyclogram>
>>> <number>1</number>
>>> <step>
>>> <number>11</number>
>>> </step>
>>> </cyclogram>
>>>
>>> Since "number" is both in "cyclogram" and "step" we
>>> need to pursue if we are actually in "cyclogram" or in "step",
>>> to decide if "number" is connected to "cyclogram" or to
>>> "step".
>>>
>>> I wonder, if there are tools which could automatically
>>> generate Java code for purpose of reading XML files.
>>> For example, user of such tool could define structure
>>> of XML file with use of some GUI (e.g. tree structure
>>> graphicaly represented). After that user could press some
>>> button and see Java code...
>>> I hope you understand what I mean.
>>
>> Parsing that with SAX requires you to keep context.
>
> Yes - normally a simple tag stack is sufficient (push on start,
> pop on end).

But you still need a bunch of if statements.

And the final code can easily become a bit messy.

I would prefer alternatives if they exists and are
usable in the context.

Arne
From: Lew on
Arne Vajhøj wrote:
> But you still need a bunch of if statements [for SAX parsing].

I've written a handful of SAX-parser based applications, starting with my
first paid Java gig eleven years ago. There really weren't many 'if'
statements in them; mostly I just instantiated an object based on the tag
being processed, using a Map to look up the appropriate handler. In this it
was similar to MVC code for servlets where you look up the handler based on a
request parameter.

> And the final code can easily become a bit messy.

That's on the programmer, not the library.

> I would prefer alternatives if they exists and are
> usable in the context.

SAX is /non pareil/ for the areas where it shines. Back in 1999, using Java
1.2 and then-current LAN tech (no gigabit or 100Mb/s LANs then) and the
relatively low-memory machines of the day we could process on the order of a
million hefty documents into or out of a database in about four hours using
SAX. We were limited pretty much by transfer speeds not CPU because of the
efficiency of SAX parsing.

And there weren't a lot of 'if' statements involved, no more so than any other
app I've worked on.

--
Lew
From: Lew on
Robbo wrote:
>> I would be glad, if you could tell me, what is the reason
>> for existing of SAXParserFactory, since there are better
>> (faster in coding) solutions? Somebody uses SAXParserFactory
>> and if yes, for what purposes?
>> I use SAXParserFactory and it is quite much work to
>> do with bunch of "if" instructions, boolean variables...

Arne Vajhøj wrote:
> SAX is great for parsing huge XML files where you
> only need some of the information.

Or all the information but not in the document's structure, or you have memory
constraints, or you need very high speed.

--
Lew
From: Arne Vajhøj on
On 24-03-2010 20:11, Lew wrote:
> Robbo wrote:
>>> I would be glad, if you could tell me, what is the reason
>>> for existing of SAXParserFactory, since there are better
>>> (faster in coding) solutions? Somebody uses SAXParserFactory
>>> and if yes, for what purposes?
>>> I use SAXParserFactory and it is quite much work to
>>> do with bunch of "if" instructions, boolean variables...
>
> Arne Vajhøj wrote:
>> SAX is great for parsing huge XML files where you
>> only need some of the information.
>
> Or all the information but not in the document's structure, or you have
> memory constraints, or you need very high speed.

SAX and StAX is for huge XML files.

If I actually needs all the data, then I would tend to
prefer StAX, but if I only need some of the data, then
SAX may be better.

For a small document even though I do not need the
XML structure, then I would use DOM and XPath to pick with.
The code is more readable.

Arne
From: Arne Vajhøj on
On 24-03-2010 20:10, Lew wrote:
> Arne Vajhøj wrote:
>> But you still need a bunch of if statements [for SAX parsing].
>
> I've written a handful of SAX-parser based applications, starting with
> my first paid Java gig eleven years ago. There really weren't many 'if'
> statements in them; mostly I just instantiated an object based on the
> tag being processed, using a Map to look up the appropriate handler. In
> this it was similar to MVC code for servlets where you look up the
> handler based on a request parameter.

But in the case we are discussing, then the same tag appears in
multiple contexts. That requires if statements.

>> And the final code can easily become a bit messy.
>
> That's on the programmer, not the library.

Using SAX to parse the type of XML documents we are talking
about has to contain if statements and other solutions does
not.

>> I would prefer alternatives if they exists and are
>> usable in the context.
>
> SAX is /non pareil/ for the areas where it shines. Back in 1999, using
> Java 1.2 and then-current LAN tech (no gigabit or 100Mb/s LANs then) and
> the relatively low-memory machines of the day we could process on the
> order of a million hefty documents into or out of a database in about
> four hours using SAX. We were limited pretty much by transfer speeds not
> CPU because of the efficiency of SAX parsing.
>
> And there weren't a lot of 'if' statements involved, no more so than any
> other app I've worked on.

But given that we have a SAX_unfriendly structure of the XML
document and no indication that it is a huge file, then SAX is
not an obvious pick.

There are other cases where SAX do make sense. Even though StAX
has overtaken quite a few of those.

Arne