apache - .docx の Xpath 検索

Question

.docx ファイルにあるサブテーブルから特定のテキストを読みたいです。xpathトラバースやJavaでサポートされている同様のAPIのような効率的な方法はありますか.

現在、Java Apache poi (以下のコードスニペット) を使用して .docx を読み取ろうとしましたが、この方法では、タグ 'w:tr' に基づいてすべてのノードを反復処理し、ノードのテキスト値を読み取る必要があります。xpath.のような検索パターンに基づいて必要なデータをすばやく取得する方法はありますか?? . どんな入力でも大歓迎です。

              File myFile = new File( "D:\\XLS-Pages\\TestSherwin.docx" );
              ZipFile docxFile = new ZipFile( myFile );
        ZipEntry documentXML = docxFile.getEntry( "word/document.xml" );
        InputStream documentXMLIS = docxFile.getInputStream( documentXML );
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        org.w3c.dom.Document doc = dbf.newDocumentBuilder().parse( documentXMLIS );

        org.w3c.dom.Element tElement = doc.getDocumentElement();
        NodeList n = (NodeList) tElement.getElementsByTagName( "w:tr" );

score 1 · Accepted Answer

docx4j で XPath を使用できます。サポートは、JAXB の XPath のサポートに基づいており、それに伴うさまざまな制限があります。

apache - .docx の Xpath 検索

1 に答える 1

Related

Reference