python - ExpatError: ドキュメント要素の後のジャンク

Question

私は本当に知りません、問題は何ですか？次のエラーが表示されます。

File "C:\Python27\lib\xml\dom\expatbuilder.py", line 223, in parseString
parser.Parse(string, True)
ExpatError: junk after document element: line 5, column 0

ジャンクは見ません！何か助けはありますか？気が狂いそう……

text = """<questionaire>
<question>
    <questiontext>Question1</questiontext>
    <answer>Your Answer: 99</answer>
</question>
<question>
    <questiontext>Question2</questiontext>
    <answer>Your Answer: 64</answer>
</question>
<question>
    <questiontext>Question3</questiontext>
    <answer>Your Answer: 46</answer>
</question>
<question>
    <questiontext>Bitte geben</questiontext>
    <answer>Your Answer: 544</answer>
    <answer>Your Answer: 943</answer>
</question>
</questionaire>"""

cleandata = text.split('<questionaire>')
cleandatastring= "".join(cleandata)
stripped = cleandatastring.strip()
planhtml = stripped.split('</questionaire>')[0]
clean= planhtml.strip()


from xml.dom import minidom

doc = minidom.parseString(clean)
for question in doc.getElementsByTagName('question'):
    for answer in question.getElementsByTagName('answer'):
        if answer.childNodes[0].nodeValue.strip() == 'Your Answer: 99':
            question.parentNode.removeChild(question)

print doc.toxml()

ありがとう！

score 7 · Accepted Answer

元のtext文字列は整形式の XML です。次に、それを壊すようなことをたくさんします。元のを解析すれtextば問題ありません。

XML は、最上位要素を 1 つだけ持つ必要があります。解析するまでに、トップレベルの<question>タグがいくつかあります。XML パーサーは最初の要素をルート要素として解析していますが、別の最上位要素を見つけて驚いています。

python - ExpatError: ドキュメント要素の後のジャンク

2 に答える 2

Related

Reference