python-3.x - Python 3 バイト文字列サブスクリプション

Question

Python では、8 ビットの文字列を処理するために文字列をバイト化しようとしています。バイト文字列は、文字列のような方法で必要な動作ではないことがわかりました。添え字を使用すると、長さ 1 のバイト文字列ではなく数値が返されます。

In [243]: s=b'hello'

In [244]: s[1]
Out[244]: 101

In [245]: s[1:2]
Out[245]: b'e'

これは、私がそれを繰り返すときに本当に難しくなります。たとえば、このコードは文字列では機能しますが、バイト文字列では失敗します。

In [260]: d = {b'e': b'E', b'h': b'H', b'l': b'L', b'o': b'O'}

In [261]: list(map(d.get, s))
Out[261]: [None, None, None, None, None]

これにより、Python 2 の一部のコードが壊れます。また、この不規則性は非常に不便だと思います。バイト文字列で何が起こっているのか、誰にも洞察がありますか?

score 0 · Accepted Answer

バイト文字列には、0 ～ 255 の範囲のバイト値が格納されます。バイト数は表示するのreprに便利なだけですが、テキストではなくデータを格納しています。観察：

>>> x=bytes([104,101,108,108,111])
>>> x
b'hello'
>>> x[0]
104
>>> x[1]
101
>>> list(x)
[104, 101, 108, 108, 111]

テキストには文字列を使用します。バイトで始まる場合は、適切にデコードします。

>>> s=b'hello'.decode('ascii')
>>> d = dict(zip('hello','HELLO'))
>>> list(map(d.get,s))
['H', 'E', 'L', 'L', 'O']

しかし、バイトで作業したい場合：

>>> d=dict(zip(b'hello',b'HELLO'))
>>> d
{104: 72, 108: 76, 101: 69, 111: 79}
>>> list(map(d.get,b'hello'))
[72, 69, 76, 76, 79]
>>> bytes(map(d.get,b'hello'))
b'HELLO'

score 0 · Accepted Answer

単純decodeに文字列を取得し、必要な要素を取得してエンコードすることができます:

s=b'hello'
t = s.decode()
print(t[1])             # This gives a char object   
print(t[1].encode())    # This gives a byte object

python-3.x - Python 3 バイト文字列サブスクリプション

2 に答える 2

Related

Reference