python - Convert a character to 16 bits of unicode encoding

Question

I have a UTF-8 character and I want to convert it into 16 bits of unicode encoding. How to do it?

Unicode of character can be obtained by reading the file where it is written and using repr() like:

import codecs
f = codecs.open("a.txt",mode='rb',encoding='utf-8')
r = f.readlines()
for i in r:
    print i,repr(i)

Output:

پٹ u'\ufeff\u067e\u0679'

Now how can I get the 16 bits of unicode encoding for u'\ufeff\u067e\u0679'?

score 3 · Accepted Answer

Unicodeコードポイントを取得するには、次のように呼び出しますord。

import io
f = io.open("a.txt", mode='r', encoding='utf-8')
for line in f:
    print (line, repr(line), ' '.join(str(ord(c)) for c in line),
                  ' '.join('{0:b}'.format(ord(c)) for c in line))

Unicodeエンコーディングは 1 つではありません。コードポイントの UTF-16 表現 ( 16 ビットを超える可能性があります)を探している場合は、単純に次のように呼び出します。

u'\ufeff\u067e\u0679'.encode('utf-16')

score 0 · Accepted Answer

したがって、文字列がにある場合s:

s_enc = s.encode("utf-16")
hex_string = "".join([format(i, "X").rjust(2,"0") for i in s_enc])
bin_string = "".join([format(i, "b").rjust(8,"0") for i in s_enc])

私はこれがあなたが求めているものだと思いますか？(py3k でテスト済みですが、2 でも動作するはずです)。

編集: Python 2x 用に若干の変更が必要です:

s_enc = s.encode("utf-16")
hex_string = "".join([format(ord(i), "X").rjust(2,"0") for i in s_enc])
bin_string = "".join([format(ord(i), "b").rjust(8,"0") for i in s_enc])

しかし、どちらにしても重要なことは、最初に encode() を呼び出して、選択したエンコーディングに変換することです (質問からは明確ではありませんが、行間を読むことは UTF-16 です)。

score 0 · Accepted Answer

>>> a=u'\ufeff\u067e\u0679'
>>> a
u'\ufeff\u067e\u0679'
>>> a.encode("utf-16")
'\xff\xfe\xff\xfe~\x06y\x06'

最後の行は、必要な文字列です。

python - Convert a character to 16 bits of unicode encoding

3 に答える 3

Related

Reference