python - PythonでXMLを解析するときに特殊文字を維持しますか？

Question

Pythonで解析し、Pythonコードとしてファイルに出力しているXMLファイルがあります。

一部のXMLには、画面にダイアログとして表示される正規表現の値と文字列が含まれているため、維持する必要のある特殊文字がいくつかあります。コードは次のとおりですが、これはどのように行うことができますか？

XMLは少しこのように見えます。

<variable id="passportnumber" value="" type="String">
    <validate>
        <regularExpression fieldID="passportnumber" errorID="3007162"><![CDATA[^[a-zA-Z+:?<>;*()%="!0-9./',&\s-]{1,35}$]]></regularExpression>
    </validate>
</variable>

そしてダイアログのために;

<if>
    <condition><![CDATA[$taxcode$ == $previousemergencytaxcode$ and $previousemergencytaxcode$ != $emergencytaxcode$]]></condition>
    <then>
        <dialog id="taxCodeOutdatedDialog" text="Are you sure this is the correct tax
        code? &#10; &#10;The emergency code for the tax year 2011-12 was
        '$previousemergencytaxcode$'. &#10;The emergency code for the tax
        year 2012-13 is '$emergencytaxcode$'. &#10; &#10;Proceed?" type="YES|NO|CANCEL" />
    </then>
</if>

完全なPythonスクリプトはここにあり、これら2つを解析するための詳細は次のとおりです。

def parse_regularExpression(self, elem):
    self.out('')
    self.out("item_regularExpression(fieldID='{0}', value='{1}')".format(elem.attrib['fieldID'],elem.text))

def parse_dialog(self, elem):
    self.out('')
    self.out("item_dialog(id='{0}', text='{1}', type='{2}')".format(elem.attrib['id'], elem.attrib['text'],elem.attrib['type']))

改行（
）は、私がどのように対処するかわからない主なものです。etreeは、トリプルクォートされていても、それを損益分岐点として出力しているようです。テキスト値を次のように出力します。

item_dialog(id='taxCodeOutdatedDialog', text='Are you sure this is the correct tax code? 

The emergency code for the tax year 2011-12 was '$previousemergencytaxcode$'. 
The emergency code for the tax year 2012-13 is '$emergencytaxcode$'. 

Proceed?', type='YES|NO|CANCEL')

score 1 · Accepted Answer

私はこれがあなたがそれをするように言っていることを正確にやっていると思います。XMLに&#10は、改行が含まれていると思います。次に、その文字列を印刷します。

印刷出力で改行を別のものに置き換えたい場合は、それを読んだ後、出力する前に行うのがおそらく最善です。（XMLで変更しようとするのではなく）。

コードは次のようになります。

def parse_dialog(self, elem):
    self.out('')
    self.out("item_dialog(id='{0}', text='{1}', type='{2}')".format(
       escape_string(elem.attrib['id']),
       escape_string(elem.attrib['text']),
       escape_string( elem.attrib['type']) ))

def escape_string(s):
  ...

問題は本質的にスクリプトインジェクションの問題/脆弱性であるため、これもはるかに堅牢です。

python - PythonでXMLを解析するときに特殊文字を維持しますか？

1 に答える 1

Related

Reference