java - XML から org.w3c.dom.Document を文字列で読み込むにはどうすればよいですか?

Question

文字列に完全な XML ドキュメントがあり、Documentオブジェクトが必要です。Google はあらゆる種類のゴミを探し出します。最も簡単な解決策は何ですか？(Java 1.5 の場合)

解決策Matt McMinnのおかげで、この実装に落ち着きました。入力の柔軟性と例外の粒度は、私にとって適切なレベルです。(エラーが不正な XML に起因するものなのか、SAXExceptionそれとも単なる IOの誤りによるものなのかを知っておくとよいでしょうIOException。)

public static org.w3c.dom.Document loadXMLFrom(String xml)
    throws org.xml.sax.SAXException, java.io.IOException {
    return loadXMLFrom(new java.io.ByteArrayInputStream(xml.getBytes()));
}

public static org.w3c.dom.Document loadXMLFrom(java.io.InputStream is) 
    throws org.xml.sax.SAXException, java.io.IOException {
    javax.xml.parsers.DocumentBuilderFactory factory =
        javax.xml.parsers.DocumentBuilderFactory.newInstance();
    factory.setNamespaceAware(true);
    javax.xml.parsers.DocumentBuilder builder = null;
    try {
        builder = factory.newDocumentBuilder();
    }
    catch (javax.xml.parsers.ParserConfigurationException ex) {
    }  
    org.w3c.dom.Document doc = builder.parse(is);
    is.close();
    return doc;
}

score 154 · Accepted Answer

おっと！

Stringで指定された文字エンコーディング(デフォルトでは UTF-8) を無視するため、このコードには潜在的に深刻な問題があります。プラットフォームを呼び出すとString.getBytes()、Unicode 文字をバイトにエンコードするためにデフォルトのエンコーディングが使用されます。そのため、パーサーは、実際には EBCDIC などを取得しているときに、UTF-8 データを取得していると考える可能性があります…きれいではありません!

代わりに、次のように、Reader で構築できる InputSource を受け取る parse メソッドを使用します。

import java.io.StringReader;
import org.xml.sax.InputSource;
…
        return builder.parse(new InputSource(new StringReader(xml)));

大したことではないように思えるかもしれませんが、文字エンコーディングの問題を知らないと、y2k に似た陰湿なコードの腐敗につながります。

score 82 · Accepted Answer

これは Java 1.5 で機能します。読みやすさのために特定の例外を取り除きました。

import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.Document;
import java.io.ByteArrayInputStream;

public Document loadXMLFromString(String xml) throws Exception
{
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

    factory.setNamespaceAware(true);
    DocumentBuilder builder = factory.newDocumentBuilder();

    return builder.parse(new ByteArrayInputStream(xml.getBytes()));
}

score 9 · Accepted Answer

DocumentではなくNodeListが必要であることを除いて、同様の問題がありました。これが私が思いついたものです。これは以前とほとんど同じソリューションであり、ルート要素を NodeList として取得するように拡張され、文字エンコーディングの問題の代わりに InputSource を使用するというエリクソンの提案を使用しています。

private String DOC_ROOT="root";
String xml=getXmlString();
Document xmlDoc=loadXMLFrom(xml);
Element template=xmlDoc.getDocumentElement();
NodeList nodes=xmlDoc.getElementsByTagName(DOC_ROOT);

public static Document loadXMLFrom(String xml) throws Exception {
        InputSource is= new InputSource(new StringReader(xml));
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setNamespaceAware(true);
        DocumentBuilder builder = null;
        builder = factory.newDocumentBuilder();
        Document doc = builder.parse(is);
        return doc;
    }

java - XML から org.w3c.dom.Document を文字列で読み込むにはどうすればよいですか?

4 に答える 4

Related

Reference