python - python - xml属性による文字列のソート、.textはxmlデータの不正な形式

Question

#!/usr/bin/env python
import os, sys, os.path
import string 

def sort_strings_file(xmlfile,typee):
    """sort all strings within given strings.xml file"""

    all_strings = {}
    orig_type=typee

    # read original file
    tree = ET.ElementTree()
    tree.parse(xmlfile)

    # iter over all strings, stick them into dictionary
    for element in list(tree.getroot()):
        all_strings[element.attrib['name']] = element.text

    # create new root element and add all strings sorted below
    newroot = ET.Element("resources")
    for key in sorted(all_strings.keys()):
        # Check for IDs
        if typee == "id":
            typee="item"

        # set main node type
        newstring = ET.SubElement(newroot, typee)

        #add id attrib
        if orig_type == "id":
            newstring.attrib['type']="id"

        # continue on
        newstring.attrib['name'] = key
        newstring.text = all_strings[key]


    # write new root element back to xml file
    newtree = ET.ElementTree(newroot)
    newtree.write(xmlfile, encoding="UTF-8")

これはうまく機能しますが、文字列が like で始まると、<b>ひどく壊れます。元

<string name="uploading_to"><b>%s</b> Odovzdávanie do</string>

になる

<string name="uploading_to" />

xml.etree Element クラスを調べたのですが、.text メソッドしかないようです。xml タグの間にすべてを取り込む方法が必要なだけです。いいえ、入力データを変更できません。それは、翻訳する準備ができている Android APK から直接取得されます。有効な XML Android コードでなければならないという事実以外に、データがどのように/何を受け取るかを予測することはできません。

score 1 · Accepted Answer

itertext()代わりにメソッドを探していると思います。要素の先頭.textに直接含まれるテキストのみを返します。

>>> test = ET.fromstring('<elem>Sometext <subelem>more text</subelem> rest</elem>')
>>> test.text
'Sometext '
>>> ''.join(test.itertext())
'Sometext more text rest'

一方.itertext()、反復子を使用すると、ネストされた要素内を含め、要素に含まれるすべてのテキストを検索できます。

ただし、含まれている子をスキップして、要素に直接含まれるテキストのみが必要な場合は、各子の組み合わせ.textと.tail値が必要です。

>>> (test.text or '') + ''.join(child.tail for child in test.getchildren())
'Sometext  middle  rest'

含まれているものすべてをキャプチャする必要がある場合は、もう少し作業を行う必要があります。をキャプチャし.text、各子を次のようにテキストにキャストしますElementTree.tostring()。

>>> (test.text or '') + ''.join(ET.tostring(child) for child in test.getchildren())
'Sometext <subelem>more text</subelem> middle <subelem>other text</subelem> rest'

ET.tostring()要素の末尾を考慮します。属性もできる(test.text or '')ので使います。.textNone

関数の最後のメソッドをキャプチャできます。

def innerxml(elem):
    return (elem.text or '') + ''.join(ET.tostring(child) for child in elem.getchildren())

python - python - xml属性による文字列のソート、.textはxmlデータの不正な形式

1 に答える 1

Related

Reference