xml - xml.dom.minidom ドキュメントを使用した Python の問題。toprettyxml() を使用した子要素間の余分な空行

Question

私はPython（およびより大きなプログラミングコミュニティ）に非常に慣れていないので、ご容赦ください.ファイルを開き、データの特定の部分を取り出し、変数値の一部を編集してから、XML を再構築します。私たちが直面している問題は、データが toprettyxml() を使用して新しい do に戻されるときにフォーマットされる方法にあります。

基本的に、ファイルの上半分には、まったく変更する必要のない要素がたくさんあるので、それらの要素全体を取得して、再構築するときにルートに追加し直します。同じレベルの同じページの下位要素の一部は、メモリ内の小さなアイテムに分解され、最下位の子レベルで再構築されています。手動で再構築および追加されているものは正常に機能しています。

したがって、大まかに関連するコードのビットは次のとおりです。

def __handleElemsWithAtrributes(elem):
    #returns empty element with all attributes of source element
    tmpDoc = Document()
    result = tmpDoc.createElement(elem.item(0).tagName)
    attr_map = elem.item(0).attributes
    for i in range(attr_map.length):
        result.setAttribute(attr_map.item(i).name,attr_map.item(i).value)
    return result

def __getWholeElement(elems):
    #returns element with all attributes of source element and all contents
    if len(elems) == 0:
        return 0
    temp = Document()
    for e in elems:
        result = temp.createElement(e.tagName)
        attr_map = e.attributes
        for i in range(attr_map.length):
            result.setAttribute(attr_map.item(i).name,attr_map.item(i).value)
        result = e
    return result


def __init__():
      ##A bunch of other stuff I'm leaving out...
                f = xml.dom.minidom.parse(pathToFile)
                doc = Document()

                modules = f.getElementsByTagName("Module")
                descriptions = f.getElementsByTagName("Description")
                steptree = f.getElementsByTagName("StepTree")
                reference = f.getElementsByTagName("LessonReference")

                mod_val = __handleElemsWithAtrributes(modules)
                des_val = __getWholeElement(descriptions)
                step_val = __getWholeElement(steptree)
                ref_val = __getWholeElement(reference)

                if des_val != 0 and mod_val != 0 and step_val != 0 and ref_val != 0:
                    mod_val.appendChild(des_val)
                    mod_val.appendChild(step_val)
                    mod_val.appendChild(ref_val)
                    doc.appendChild(mod_val)
               o.write(doc.toprettyxml())

いいえ、いくつかの異なる領域からコピーしたため、ここではタブが正確に保存されていませんが、要点は理解できると思います。

基本的に、私が使用している入力は次のようになります。

<Module aatribute="" attribte2="" attribute3="" >
<Description>
    <Title>SomeTitle</Title>
    <Objective>An objective</Objective>
    <Action>
        <Familiarize>familiarize text</Familiarize>
    </Action>
    <Condition>
        <Familiarize>Condition text</Familiarize>
    </Condition>
    <Standard>
        <Familiarize>Standard text</Familiarize>
    </Standard>
    <PerformanceMeasures>
        <Measure>COL text</Measure>
    </PerformanceMeasures>
    <TMReferences>
        <Reference>Reference text</Reference> 
    </TMReferences>
</Description>

そして、再組み立てすると、次のようになります。

<Module aatribute="" attribte2="" attribute3="" >
<Description>


    <Title>SomeTitle</Title>


    <Objective>An objective</Objective>


    <Action>


        <Familiarize>familiarize text</Familiarize>


    </Action>


    <Condition>


        <Familiarize>Condition text</Familiarize>


    </Condition>


    <Standard>


        <Familiarize>Standard text</Familiarize>


    </Standard>


    <PerformanceMeasures>


        <Measure>COL text</Measure>


    </PerformanceMeasures>


    <TMReferences>


        <Reference>Reference text</Reference> 


    </TMReferences>


</Description>

余分な空の行をすべて作成しないようにするにはどうすればよいですか? 何か案は？

score 2 · Accepted Answer

同じ問題があります。問題は、Python が行をジャンプするたびに、ツリーに textNode を追加することです。したがって、topprettyxml()あなたが気付かないうちにツリーにノードを追加するため、非常に悪質な関数です。

解決策の 1 つは、最初にファイルを解析するときに、役に立たない textNodes をすべて消去する方法を見つけることです (現在探していますが、まだ「きれいな」解決策が見つかりません)。

ノードごとの削除:

def cleanUpNodes(nodes):
    for node in nodes.childNodes:
        if node.nodeType == Node.TEXT_NODE:
            node.data = ''
    nodes.normalize()

http://mail.python.org/pipermail/xml-sig/2004-March/010191.htmlから

score -1 · Accepted Answer

ありがとう、これは再帰的に機能します!!

def cleanUpNodes(self,nodes):
        for node in nodes.childNodes:
            if node.nodeType == node.TEXT_NODE and (node.data.startswith('\t') or node.data.startswith('\n') or node.data.startswith('\r') ):
                node.data = ''
            if node.nodeType == node.ELEMENT_NODE:
                self.cleanUpNodes(node)
        nodes.normalize()

xml - xml.dom.minidom ドキュメントを使用した Python の問題。toprettyxml() を使用した子要素間の余分な空行

2 に答える 2

Related

Reference