java - Java/AndroidでXMLのUnicode文字を読み取る

Question

いくつかのUnicode文字を使用してXML出力を取得しようとしていました。タグ内の完全な文字列を読み取ることができませんでしたが、1つだけです。

これが私のXML出力です

 <item>
    <id>1</id>    
    <name>&#x0DBD;&#x0DDC;&#x0DBD;&#x0DCA;</name>
    <cost>155</cost>
    <description>&#x0DBD;&#x0DDC;</description>
</item>

これは、XML文字列を解析するために使用するJavaコードです。

    public Document getDomElement(String xml) {
Document doc = null;
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
try {

    DocumentBuilder db = dbf.newDocumentBuilder();

    InputSource is = new InputSource();
    is.setEncoding("UTF-16");
    is.setCharacterStream(new StringReader(xml));
    doc = db.parse(is);

} catch (ParserConfigurationException e) {
    Log.e("Error: ", e.getMessage());
    return null;
} catch (SAXException e) {
    Log.e("Error: ", e.getMessage());
    return null;
} catch (IOException e) {
    Log.e("Error: ", e.getMessage());
    return null;
}
// return DOM
return doc;
}

通常の英語の文字を使用すると、完全な文字列が得られます。

score 1 · Accepted Answer

私はあなたのコードを試しましたが、問題はありません。英語以外の文字を含むノードを評価すると、存在し、正しい数の文字があります。使用されているフォントにそのグリフがないため、印刷できませんがvalue.codePointAt(i)、正しいコードポイントを返します。

    NodeList list = doc.getDocumentElement().getChildNodes();
    for (int i=0; i<list.getLength(); i++)
    {
        String value = list.item(i).getTextContent();
        for (int j=0; j<value.length(); j++)
            System.out.print(" " + value.codePointAt(j));
        System.out.println();
    }

出力:

 49
 3517 3548 3517 3530
 49 53 53
 3517 3548

コードポイントの 10 進数表現に対応します。

手動で xml 文字列を作成しました。すでに記憶に残っていますよね？

score 0 · Accepted Answer

これは、私の問題を解決するために使用したコードです。

   NodeList idlist = doc.getElementsByTagName(KEY_ID);
    NodeList namelist = doc.getElementsByTagName(KEY_NAME);
    NodeList costlist = doc.getElementsByTagName(KEY_COST);
    NodeList desclist = doc.getElementsByTagName(KEY_DESC);
    for (int i=0; i<idlist.getLength(); i++)
    {
        Item item = new Item();
        item.setCost(costlist.item(i).getTextContent());
        item.setDescription(desclist.item(i).getTextContent());
        item.setName(namelist.item(i).getTextContent());
        itemarray.add(item);

    }

score 0 · Accepted Answer

Unicode とは通常 UTF-8 を意味しますが、あなたは UTF-16 を使用しています。これは悪いことです。
XML はヘッダーで独自のエンコーディングを定義するため、オーバーライドする必要はありません。

java - Java/AndroidでXMLのUnicode文字を読み取る

3 に答える 3

Related

Reference