1

朝、

Javaで巨大な xml ファイル (2GB) を解析する必要があります。多くのタグがありますが、2 つのタグの内容を毎回共通のファイルに書き込むだけでよい<title>ので<subtext>SaxParseを使用します

これまでのところ、出力ファイルに 1M95 テキストを書き込むことができましたが、それまでに次の例外が発生します

org.xml.sax.SAXParseException; systemId: filePath; lineNumber: x; columnNumber: y; JAXP00010004 : La taille cumulée des entités est "50 000 001" et dépasse la limite de "50 000 000" définie par "FEATURE_SECURE_PROCESSING".
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:327)
    at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1465)
    at com.sun.org.apache.xerces.internal.impl.XMLScanner.checkEntityLimit(XMLScanner.java:1544)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.handleCharacter(XMLDocumentFragmentScannerImpl.java:1940)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEntityReference(XMLDocumentFragmentScannerImpl.java:1866)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:3058)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:504)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:327)
    at javax.xml.parsers.SAXParser.parse(SAXParser.java:328)
    at Parsing.main(Class.java:38)

例外の翻訳は次のようになります。

The cumulative size of the entities is "50 000 001" which exceeds the boundary of "50 000 000" defined by "FEATURE_SECURE_PROCESSING".

これは私が書いたコードです:

public class Parsing {

public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {

    try {
        File inputFile = new File(System.getProperty("user.dir") + "/input.xml");
        SAXParserFactory factory = SAXParserFactory.newInstance();
        SAXParser saxParser = factory.newSAXParser();
        UserHandler userhandler = new UserHandler();
        saxParser.parse(inputFile, userhandler);
    } catch (Exception e) {
        e.printStackTrace();
    }

}

public static void doThingOne(String text, String title) throws IOException {

    // Write the text and the title on a file
}


public static void doThingTwo(String text, String title) throws IOException {
    //Write the text and the title on another file

}

class UserHandler extends DefaultHandler {

boolean bText = false;
boolean bTitle = false;
StringBuffer tagTextBuffer; 
StringBuffer tagTitleBuffer; 
String text = null;
String title = null;

@Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {

    if (qName.equals("title")) {
        tagTitleBuffer = new StringBuffer();
        bTitle = true;
    } else if (qName.equalsIgnoreCase("text")) {
        tagTextBuffer = new StringBuffer();
        bText = true;
    }
}

public void endElement(String uri, String localName, String qName) throws SAXException {
    if (qName.equals("title")) {
        bTitle = false;
        title = tagTextBuffer.toString();

    } else if (qName.equals("text")) {
        text = tagTextBuffer.toString();
        bText = false;
        if (text!=null && title == "One") {
            try {
                Parsing.doThingOne(page, title);
            } catch (IOException e) {
                e.printStackTrace();
            }
        } else if (text != null) {
            try {
                Parsing.doThingTwo(page, title);
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
}

public void characters(char ch[], int start, int length) throws SAXException {
    if (bTitle) {
        tagTitleBuffer.append(new String(ch, start, length));
    } else if (bText) {
        tagTextBuffer.append(new String(ch, start, length));
    }
}
}

お時間をいただきありがとうございます。

4

2 に答える 2