java - StAX のメモリ不足エラー

Question

次の単純な StAX コードを使用して、XML のすべてのタグを反復処理しています。input.xmlのサイズ > 100 MB

XMLInputFactory xif = XMLInputFactory.newInstance();
        FileInputStream in = new FileInputStream("input.xml");
        XMLStreamReader xsr = XMLInputFactory.newInstance().createXMLStreamReader(in);

        xsr.next();
        while (xsr.hasNext()) {

            xsr.next();
            if(xsr.isStartElement() || xsr.isEndElement())
                 System.out.println(xsr.getLocalName());            
            }
        }

このエラーが発生しています：

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

これを回避する方法を教えてください。StAX は巨大な XML を適切に処理すると読みましたが、DOM パーサーと同じエラーが発生します。

score 1 · Accepted Answer

JVM の実行中にヒープサイズを定義する

-Xms    initial java heap size
-Xmx    maximum java heap size
-Xmn    the size of the heap for the young generation

例：

bin/java.exe -Xmn100M -Xms500M -Xmx500M

score 1 · Accepted Answer

-Xmx パラメータを使用して、VM の MaxHeap サイズを増やします。

java -Xmx512m ....

score 0 · Accepted Answer

ウィキペディアから: 従来、XML API は次のいずれかです。

tree based - the entire document is read into memory as a tree structure for random 
access by the calling application
event based - the application registers to receive events as entities are encountered 
within the source document.

StAX was designed as a median between these two opposites. In the StAX metaphor,
the  programmatic  entry point is a cursor that represents a point within the 
document. The application moves the cursor forward - 'pulling' the information from 
the parser as it needs. This is different from an event based API - such as SAX - 
which 'pushes' data to the application - requiring the application to maintain state 
between events as necessary to keep track of location within the document.

したがって、100M以上の場合-私はSAXを好みます-可能であれば代わりにStAXを使用してください。

しかし、JVM64でファイルサイズ2.6GBのコードを試しました。問題なし。したがって、問題はファイルのサイズではなく、データの問題である可能性があると思います。

java - StAX のメモリ不足エラー

3 に答える 3

Related

Reference