python - Pythonエラーのスウェーデン語の文字

Question

スウェーデン語の文字を含む単語を使用してリストに保存するプログラムを作成しています。スウェーデン語の文字をリストに入れる前に印刷できますが、入れた後は正常に表示されず、文字がごちゃごちゃになります。

これが私のコードです：

# coding=UTF-8 

def get_word(lines, eng=0):
    if eng == 1: #function to get word in english
        word_start = lines[1]

def do_format(word, lang):
    if lang == "sv":
        first_word = word
        second_word = translate(word, lang)
        element = first_word + " - " + second_word
    elif lang == "en":
        first_word = translate(word, lang)
        second_word = word
        element = first_word + " - " + second_word
    return element

def translate(word, lang):
    if lang == "sv":
        return "ENGLISH"
    if lang == "en":
        return "SWEDISH"

translated = []
path = "C:\Users\LK\Desktop\Dropbox\Dokumentai\School\Swedish\V47.txt"

doc = open(path, 'r')           #opens the documen
doc_list = []                   #the variable that will contain list of words
for lines in doc.readlines():   #repeat as many times as there are lines
    if len(lines) > 1:          #ignore empty spaces
        lines = lines.rstrip()  #don't add "\n" at the end
        doc_list.append(lines)  #add to the list
for i in doc_list:
    print i

for i in doc_list:
    if "-" in i:
        if i[0] == "-":
            element = do_format(i[2:], "en")
            translated.append(element)
        else:
            translated.append(i)
    else:
        element = do_format(i, "sv")
        translated.append(element)


print translated
raw_input()

問題を次のような単純なコードに単純化できます。

# -*- coding: utf-8 -*-

test_string = "ö"
test_list = ["å"]

print test_string, test_list

それを実行すると、これが得られます

√ ['\xc3\xa5']

score 1 · Accepted Answer

注意すべきことがいくつかあります。

壊れたキャラクター。これは、python が UTF-8 を出力しているように見えますが、端末が ISO-8859-X モードに設定されているように見えるために発生するようです (したがって、2 つの文字)。Python 2 で適切な Unicode 文字列を使用しようと思います! (常にu"ö"の代わりに"ö")。ロケール設定を確認してください（localeLinuxの場合はコマンド）
リスト内の奇妙な文字列。Python ではprint eが出力されstr(e)ます。リスト ( など["å"]) の場合、の実装は__str__と同じ__repr__です。そして、リストに含まれる要素のいずれかrepr(some_list)を呼び出すためrepr、表示される文字列になります。

例repr(string):

>>> print u"ö"
ö
>>> print repr(u"ö")
u'\xf6'
>>> print repr("ö")
'\xc3\xb6'

score 1 · Accepted Answer

リストを印刷すると、何らかの構造として印刷できます。たとえば、join()string メソッドを使用して、文字列に変換する必要があります。テストコードを使用すると、次のようになります。

print test_string, test_list
print('%s, %s, %s' % (test_string, test_list[0], ','.join(test_list)))

そして出力：

ö ['\xc3\xa5']
ö, å, å

あなたのメインプログラムでは次のことができると思います：

print('%s' % (', '.join(translated)))

score 0 · Accepted Answer

module を使用codecsして、読み取りバイトのエンコーディングを指定できます。

import codecs

doc = codecs.open(path, 'r', encoding='utf-8')           #opens the document

で開かれたファイルはcodecs.open、指定されたエンコーディングで未加工のバイトをデコードした後、ユニコード文字列を提供します。

コードでは、文字列リテラルの前にu, を付けて、Unicode 文字列にします。

# -*- coding: utf-8 -*-

test_string = u"ö"
test_list = [u"å"]

print test_string, test_list[0]

python - Pythonエラーのスウェーデン語の文字

3 に答える 3

Related

Reference