java - すべての SOAP XML ノードのテキストのみを Java で抽出する

Question

すべてのノードのテキストコンテンツを抽出する次の SOAP XML があります。

<soap:Envelope xmlns:soap="http://www.w3.org/2001/12/soap-envelope" soap:encodingStyle="http://www.w3.org/2001/12/soap-encoding"
    xmlns:m="http://www.example.org/stock">
    <soap:Body>
        <m:GetStockName>
            <m:StockName>ABC</m:StockName>
        </m:GetStockName>
        <!--some comment-->
        <m:GetStockPrice>
            <m:StockPrice>10 \n </m:StockPrice>
            <m:StockPrice>\t20</m:StockPrice>
        </m:GetStockPrice>
    </soap:Body>
</soap:Envelope>

予想される出力は次のようになります。

'ABC10 \n \t20'

DOMで次のことを行いました。

public static String parseXmlDom() throws ParserConfigurationException,
        SAXException, IOException, FileNotFoundException {

    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    DocumentBuilder builder = factory.newDocumentBuilder();
    // Read XML File
    String xml = IOUtils.toString(new FileInputStream(new File(
            "./files/request2.xml")), "UTF-8");
    InputSource is = new InputSource(new StringReader(xml));
    // Parse XML String to DOM
    factory.setNamespaceAware(true);
    factory.setIgnoringComments(true);
    Document doc = builder.parse(is);
    // Extract nodes text
    NodeList nodeList = doc.getElementsByTagNameNS("*", "*");
    Node node = nodeList.item(0);
    return node.getTextContent();
}

そしてSAXで：

public static String parseXmlSax() throws SAXException, IOException, ParserConfigurationException {

    final StringBuffer sb = new StringBuffer();
    SAXParserFactory factory = SAXParserFactory.newInstance();
    SAXParser saxParser = factory.newSAXParser();
    // Declare Handler
    DefaultHandler handler = new DefaultHandler() {
        public void characters(char ch[], int start, int length) throws SAXException {
            sb.append((new String(ch, start, length)));
        }
    };
    // Parse XML
    saxParser.parse("./files/request2.xml", handler);
    return sb.toString();
}

私が受け取る両方のアプローチについて：

簡単にreturn sb.toString().replaceAll("\n", "").replaceAll("\t", "");期待どおりの結果が得られることはわかっていますが、XML ファイルの形式が不適切な場合、たとえば余分なスペースが含まれていると、結果にも余分なスペースが含まれてしまいます。

また、SAX または DOM で XML を解析する前に XML を 1 行として読み取るこの方法を試しましsoap:Envelopeたが、ブレークラインがある場合にプロパティ間のスペースを削除するため、SOAP XML の例ではうまくいきません( xmlns:m):

<soap:Envelope xmlns:soap="http://www.w3.org/2001/12/soap-envelope" soap:encodingStyle="http://www.w3.org/2001/12/soap-encoding"xmlns:m="http://www.example.org/stock"><soap:Body><m:GetStockName><m:StockName>ABC</m:StockName></m:GetStockName><m:GetStockPrice><m:StockPrice>10 \n  </m:StockPrice><m:StockPrice>\t20</m:StockPrice></m:GetStockPrice></soap:Body></soap:Envelope>
[Fatal Error] :1:129: Element type "soap:Envelope" must be followed by either attribute specifications, ">" or "/>".

XML ファイルが 1 行で構成されているか、適切な/不適切な形式の複数行で構成されているかに関係なく (コメントも無視して)、SOAP XML 内のすべてのノードのテキストコンテンツだけを読み取るにはどうすればよいですか?

java - すべての SOAP XML ノードのテキストのみを Java で抽出する

0 に答える 0

Related

Reference