python - Pythonのxml.dom.minidomでのヌルテキストノードの問題

Question

環境：Python 2.6.5、Eclipse SDK 3.7.1、Pydev 2.3

を使用してPythonでXMLデータの値を解析および変更しようとしていますがxml.dom.minidom、空白のテキストノードで問題が発生しています。

XMLファイルをDOMオブジェクトに解析し、それを使用して文字列に変換し直すと、toxml()すべての空白のテキストノードの後に終了する「Description」タグがめちゃくちゃになります。

問題が何か知っている人はいますか？

issue.pyの内容

from xml.dom import minidom  
xml_dom_object = minidom.parse('news_shows.xml')  
main_node = xml_dom_object.getElementsByTagName('NewsShows')[0]  
xml_string = main_node.toxml()  
print xml_string

news_shows.xmlの内容（2つの空白のテキストノードに注意してください）：

<NewsShows Planet="Earth" Language="English" Year="2012">
<NewsShow ShowName="The_Young_Turks">
 <Description Detail="Best_show_of_all_time_according_to_many">True</Description>
 <Description Detail="The_only_source_of_truth"></Description>
 <Description Detail="Three_hours_of_truth_per_day">True</Description>
</NewsShow>
<NewsShow ShowName="The_Rachel_Maddow_Show">
<Description Detail="Pretty_great_as_well">True</Description>
<Description Detail="Sucks_badly">False</Description>
<Description Detail="Conveys_more_information_than_TYT"></Description>
</NewsShow>
</NewsShows>

スクリプトの出力（混乱している2つの「Description」タグに注意してください）：

<NewsShows Language="English" Planet="Earth" Year="2012">
<NewsShow ShowName="The_Young_Turks">
 <Description Detail="Best_show_of_all_time_according_to_many">True</Description>
 <Description Detail="The_only_source_of_truth"/>
 <Description Detail="Three_hours_of_truth_per_day">True</Description>
</NewsShow>
<NewsShow ShowName="The_Rachel_Maddow_Show">
<Description Detail="Pretty_great_as_well">True</Description>
<Description Detail="Sucks_badly">False</Description>
<Description Detail="Conveys_more_information_than_TYT"/>
</NewsShow>

score 1 · Accepted Answer

以下は、ソース「python-3.2.3.amd64 \ Lib \ xml \ dom\minidom.py」からのコードスニペットです。

def writexml(self, writer, indent="", addindent="", newl=""):
    # indent = current indentation
    # addindent = indentation to add to higher levels
    # newl = newline string
    writer.write(indent+"<" + self.tagName)

    attrs = self._get_attributes()
    a_names = sorted(attrs.keys())

    for a_name in a_names:
        writer.write(" %s=\"" % a_name)
        _write_data(writer, attrs[a_name].value)
        writer.write("\"")
    if self.childNodes:
        writer.write(">")
        if (len(self.childNodes) == 1 and
            self.childNodes[0].nodeType == Node.TEXT_NODE):
            self.childNodes[0].writexml(writer, '', '', '')
        else:
            writer.write(newl)
            for node in self.childNodes:
                node.writexml(writer, indent+addindent, addindent, newl)
            writer.write(indent)
        writer.write("</%s>%s" % (self.tagName, newl))
    else:
        writer.write("/>%s"%(newl))

関数によると、「self」変数（XMLに書き込まれるノード）に「childNodes」がない場合、ライターは自己終了タグを書き込みます。

score 0 · Accepted Answer

これは実際にどこかで問題を引き起こしていますか？私がxmlについて知っているすべてのことから、文字列<tag></tag>と<tag/>は同等です。

python - Pythonのxml.dom.minidomでのヌルテキストノードの問題

2 に答える 2

Related

Reference