java - Java XPath ウムラウト/母音の解析

Question

次のxml構造を解析したい:

<?xml version="1.0" encoding="utf-8"?>
<documents>
  <document>
    <element name="title">
      <value><![CDATA[Personnel changes: Müller]]></value>
    </element>
  </document>
</documents>

このelement name="?????構造を解析するために、次の方法で XPath を使用します。

XPath xPath = XPathFactory.newInstance().newXPath();

String currentString = (String) xPath.evaluate("/documents/document/element[@name='title']/value",pCurrentXMLAsDOM, XPathConstants.STRING);

構文解析自体は正常に機能しますが、ドイツ語のウムラウト (母音) の「Ü」、「ß」などにはいくつか問題があります。currentString を出力すると、文字列は次のようになります。

Personnel changes: MÃ¼ller

しかし、私は Xml のような文字列を持ちたい:

Personnel changes: Müller

追加するだけです：xmlファイルの内容を変更することはできません。取得したように解析する必要があるため、すべての文字列を正しい方法で解析する必要があります。

score 2 · Accepted Answer

エンコードの問題のように聞こえます。XML は UTF-8 でエンコードされた Unicode であり、ISO-8859-1 としてエンコードされて印刷されているようです。Java ソースのエンコード設定を確認してください。

編集：デフォルトのJava文字エンコーディングの設定を参照してください。設定方法についてfile.encoding。

score 1 · Accepted Answer

私は今、優れた迅速な解決策を見つけました：

public static String convertXMLToString(File pCurrentXML) {

        InputStream is = null;
        try {
            is = new FileInputStream(pCurrentXML);
        } catch (FileNotFoundException e1) {
            e1.printStackTrace();
        }
        String contents = null;
         try {

                try {
                    contents = IOUtils.toString(is, "UTF-8");
                } catch (IOException e) {
                    e.printStackTrace();
                }
            } finally {
                IOUtils.closeQuietly(is);
            }

        return contents;

    }

その後、String を DOM オブジェクトに変換します。

static Document convertStringToXMLDocumentObject(String string) {

        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = null;
        Document document = null;

        try {
            builder = factory.newDocumentBuilder();
        } catch (ParserConfigurationException e) {
            e.printStackTrace();
        }

        try {
            document = builder.parse(new InputSource(new StringReader(string)));
        } catch (SAXException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }

        return document;

    }

そして、たとえばXPathを使用してDOMを解析するだけで、すべての要素の値はUTF-8です!! デモンストレーション：

currentString = (String) xPath.evaluate("/documents/document/element[@name='title']/value",pCurrentXMLAsDOM, XPathConstants.STRING);
System.out.println(currentString);

出力：

Personnel changes: Müller

:)

score 0 · Accepted Answer

ファイルがutf8でエンコードされていることがわかっている場合は、次のようにしてみてください。

    FileInputStream fis = new FileInputStream("yourfile.xml");
    InputStreamReader in = new InputStreamReader(fis, "UTF-8");

    InputSource pCurrentXMLAsDOM = new InputSource(in);

java - Java XPath ウムラウト/母音の解析

3 に答える 3

Related

Reference