From: Arne Vajhøj on
On 07-02-2010 12:59, Roedy Green wrote:
> It seems to me the usual XML tools in Java load the entire XML file
> into RAM.

????

W3CDOM and JAXB do load all data in memory.

SAX and StAX do not load all data in memory.

Arne
From: Lew on
On 2/7/2010 1:20 PM, Donkey Hottie wrote:
> On 7.2.2010 20:14, Peter Duniho wrote:
>> Roedy Green wrote:
>>> It seems to me the usual XML tools in Java load the entire XML file
>>> into RAM. Are there any tools that process sequentially, bringing in
>>> only a chunk at a time so you could handle really fat files.
>>
>> Sounds like you want the XMLStreamReader interface:
>> http://java.sun.com/javase/6/docs/api/javax/xml/stream/XMLStreamReader.html
>>
>> I haven't used the Java version myself (there's a similar type in .NET),
>> and haven't looked closed to determine the specifics. But I presume
>> there's a way to get an implementation of the interface (looks like
>> XMLInputFactory is the way to go).
>>
>> Of course, if per a previous discussion you're stuck on Java 1.5, this
>> is unavailable to you. But otherwise, you should find it exactly what
>> you're asking for.
>>
>> Pete
>
> SAX interface works fine even with Java 1.4, and it does what Roedy wants.

It's been around since Java 1.2; it better work with 1.4.

--
Lew

From: Lew on
Roedy Green wrote:
>> It seems to me the usual XML tools in Java load the entire XML file
>> into RAM. Are there any tools that process sequentially, bringing in
>> only a chunk at a time so you could handle really fat files.

Donkey Hottie wrote:
> Java has tools for such XML files. SAX processes XML so that it does not
> need to load it all to memory.

I first used SAX for XML parsing in early 1999. There's nothing new
about it.

SAX, and its equally handy StAX sibling, are perfect for single-pass,
very-high-speed, memory-parsimonious handling of XML documents.

Roedy has an interesting definition of "usual XML tools", since he's
ignoring two out of three interfaces, including one that's been around
nearly forever.

--
Lew
From: Arne Vajhøj on
On 07-02-2010 15:31, Lew wrote:
> On 2/7/2010 1:20 PM, Donkey Hottie wrote:
>> On 7.2.2010 20:14, Peter Duniho wrote:
>>> Roedy Green wrote:
>>>> It seems to me the usual XML tools in Java load the entire XML file
>>>> into RAM. Are there any tools that process sequentially, bringing in
>>>> only a chunk at a time so you could handle really fat files.
>>>
>>> Sounds like you want the XMLStreamReader interface:
>>> http://java.sun.com/javase/6/docs/api/javax/xml/stream/XMLStreamReader.html
>>>
>>>
>>> I haven't used the Java version myself (there's a similar type in .NET),
>>> and haven't looked closed to determine the specifics. But I presume
>>> there's a way to get an implementation of the interface (looks like
>>> XMLInputFactory is the way to go).
>>>
>>> Of course, if per a previous discussion you're stuck on Java 1.5, this
>>> is unavailable to you. But otherwise, you should find it exactly what
>>> you're asking for.
>>>
>>> Pete
>>
>> SAX interface works fine even with Java 1.4, and it does what Roedy
>> wants.
>
> It's been around since Java 1.2; it better work with 1.4.

Yes and no.

SAX was added to Java API in 1.4.

JAXP API including SAX existed earlier than Java 1.4 and
libraries implementing it could be separately downloaded.

I have done the latter for Java 1.3 and it may have
existed already for 1.2.

Arne



From: Mike Schilling on
Arne Vajh�j wrote:
> On 07-02-2010 12:59, Roedy Green wrote:
>> It seems to me the usual XML tools in Java load the entire XML file
>> into RAM.
>
> ????
>
> W3CDOM and JAXB do load all data in memory.
>
> SAX and StAX do not load all data in memory.

If you use XSLT to process an XML file, it has to keep a complete
representation of the resulting XML document into memory, since an XSLT
transformation can include XPath expressions, and XPath can in principle
access anything in the dociument. This is true even if the input to XSLT is
a SAXSource.