python-3.x - 特殊文字のコードポイントを表す整数を見つける方法は? TypeError: ord() は文字を予期していましたが、長さ 2 の文字列が見つかりました

Question

さまざまなエンコーディングで国内のいくつかの文字のコードポイントを表す整数を計算したい（これらのコーデックにはすべてそれらの文字が含まれていると確信しています）。私のプログラムは次のようになります。

characters = ['Č', 'č', 'Š', 'š', 'Ž', 'ž']
codecs = ['iso8859_2', 'cp1250', 'mac_latin2', 'utf-8', 'utf_16_le', 'utf_16_be']

for letter in characters:
    for code in codecs:
        print(letter + ' ' + code + ' ' + str(ord(letter.encode(code))))

出力：

Č iso8859_2 200
Č cp1250 200
Traceback (most recent call last):
  File "C:/Users/Miha/Documents/2Semester/IK/Vaja2/chrEncode.py", line 7, in <module>
    print(letter + ' ' + code + ' ' + str(ord(letter.encode(code))))
TypeError: ord() expected a character, but string of length 2 found
Č mac_latin2 137

score 0 · Accepted Answer

int.from_bytes(bytes, byteorder, *, signed=False)の代わりに仕事をするクラスメソッドを見つけましたord()。コード：

characters = ['Č', 'č', 'Š', 'š', 'Ž', 'ž']
codecs = ['cp852', 'iso8859_2', 'cp1250', 'mac_latin2', 'utf-8', 'utf_16_le', 'utf_16_be']

for letter in characters:
    for codec in codecs:
        decCodePoint = int.from_bytes(letter.encode(codec), byteorder='big') #code point integer
        print(letter + ' ' + codec + ' ' + str(decCodePoint) + ' ' + str(hex(decCodePoint)) + ' ' + str(oct(decCodePoint))) #i also convert decimal integer to hexadecimal and octal

'Č' のみの出力:

Č cp852 172 0xac 0o254
Č iso8859_2 200 0xc8 0o310
Č cp1250 200 0xc8 0o310
Č mac_latin2 137 0x89 0o211
Č utf-8 50316 0xc48c 0o142214
Č utf_16_le 3073 0xc01 0o6001
Č utf_16_be 268 0x10c 0o414

python-3.x - 特殊文字のコードポイントを表す整数を見つける方法は? TypeError: ord() は文字を予期していましたが、長さ 2 の文字列が見つかりました

2 に答える 2

Related

Reference