python - テキストファイルからの非ASCII文字の読み取り

Question

私はPython2.7を使用しています。コーデックのような多くのことを試しましたが、うまくいきませんでした。どうすればこれを修正できますか。

myfile.txt

wörd

私のコード

f = open('myfile.txt','r')
for line in f:
    print line
f.close()

出力

s\xc3\xb6zc\xc3\xbck

Eclipseとコマンドウィンドウの出力は同じです。Win7を使用しています。ファイルから読み取らなくても文字は問題ありません。

score 15 · Accepted Answer

import codecs
#open it with utf-8 encoding 
f=codecs.open("myfile.txt","r",encoding='utf-8')
#read the file to unicode string
sfile=f.read()

#check the encoding type
print type(file) #it's unicode

#unicode should be encoded to standard string to display it properly
print sfile.encode('utf-8')
#check the type of encoded string

print type(sfile.encode('utf-8'))

score 7 · Accepted Answer

まず第一に-ファイルのエンコーディングを検出します


  from chardet import detect
  encoding = lambda x: detect(x)['encoding']
  print encoding(line)

次に、それをユニコードまたはデフォルトのエンコーディングstrに変換します。


  n_line=unicode(line,encoding(line),errors='ignore')
  print n_line
  print n_line.encode('utf8')

score 1 · Accepted Answer

これは端末のエンコーディングです。ファイルで使用しているのと同じエンコーディングで端末を構成してみてください。UTF-8を使用することをお勧めします。

ちなみに、問題を回避するために、すべての入出力をデコード-エンコードすることをお勧めします。

f = open('test.txt','r')    
for line in f:
    l = unicode(line, encoding='utf-8')# decode the input                                                                                  
    print l.encode('utf-8') # encode the output                                                                                            
f.close()

python - テキストファイルからの非ASCII文字の読み取り

3 に答える 3

Related

Reference