1

Okay, I feel a bit lost right now. I have some problems with unicode (or utf-8 ?)

I am using Python3.3 on linux (But I have the same problem on windows).

I try to create an XML file with Elementtree.

    item = ET.Element("item")
    item_title = Et.SubElement(item, "title")

That is of course not everything, just an example. So now I want to have the tag 'title' have a text like this (replace the ##Content## with random content, doesnt matter so much):

    # Thats how I create the text for the tag
    item.title.text = u'<![CDATA[##CONTENT##]>'

    # This is how I want it to look like
    <title><![CDATA[##CONTENT##]></title>

    # Thats what I get
    <title>&lt;![CDATA[##CONTENT##]&gt;</title>

    # These are some of the things I tried for writing it to an xml file
    ET.ElementTree(item).write(myOutputFile, encoding="unicode")
    myOutputFile.write(ET.tostring(item, encoding='unicode', method='xml')))
    myOutputFile.write(str(ET.tostring(item, encoding='utf-8', method='xml'))) 
    myOutputFile.write(str(ET.tostring(item)

    # Oh and thats how I open the file for writing
    myOutputFile = codecs.open(HereIsMyFile, 'w', encoding='utf-8')

I tried to search and found some similar sounding problems (some of the things I tried are from SO already), but none seems to work. They changed some stuff in the output, but never showed the < or >. I also noticed, if I use utf-8 I have to use str() when writing to the file. That got me also confused about the difference in unicode and utf-8, I tried to read some stuff about that but that didn't really help me in my actual problem.

At this point I don't really know where to look for my error and I would love a hint where to look. Is it the way I write to the file? How I open it? Or is it Elementtree causing the error? (I didn't try something else, like lxml, because well, that would mean rewriting a lot of stuff I guess).

I hope you can help me and if something isn't clear I will try to explain it a bit better!

Edit: Oh and I also tried to open the file without codecs, because I somewhere read it is not needed anymore in Python3.x but I wasn't so sure anymore, so I tried it.

4

2 に答える 2

1
  1. ElementTree を使用して XML ドキュメントを記述する正しい方法は次のとおりです。

    codecs.open(HereIsMyFile, 'w', encoding='utf-8'): root.write(myOutputFile)

  2. のエンコーディングを指定する場合は、XML 標準で定義されているものwrite()を使用する必要があります。エンコーディングではなく、標準です。unicode

  3. ElementTree は CDATA をサポートしていません。表示されている効果は、ElementTreetextがノード内の特殊文字に気づき、それらをエスケープすることです。それを防ぐ方法はありません。

    この回答には、CDATA 要素の実装が含まれています: How to output CDATA using ElementTree

于 2013-11-07T15:40:22.177 に答える