python - Python: 正規表現と文字列の長さ (バイト単位)

Question

私はPythonでプログラムを書いていますが、いくつか質問があります（私はPythonに100％慣れていません）：

import re

rawData = '7I+8I-7I-9I-8I-'

print len(rawData)

rawData = re.sub("[0-9]I\+","",rawData)
rawData = re.sub("[0-9]I\-","",rawData)

print rawData

を使用して 2 つの正規表現を 1 つにマージする方法は|? これは、両方を取り除き、1 つの正規表現操作のみを使用9I-することを意味します。9I+
len(rawData) は rawData の長さをバイトで返しますか?

ありがとうございました。

score 5 · Accepted Answer

違いを見ます：

$ python3
Python 3.1.3 (r313:86834, May 20 2011, 06:10:42) 
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> len('día')   # Unicode text
3
>>> 

$ python
Python 2.7.1 (r271:86832, May 20 2011, 17:19:04) 
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> len('día')   # bytes
4
>>> len(u'día')  # Unicode text
3
>>>


Python 3.1.3 (r313:86834, May 20 2011, 06:10:42) 
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> len(b'día')
  File "<stdin>", line 1
SyntaxError: bytes can only contain ASCII literal characters.
>>> len(b'dia')
3
>>>

score 0 · Accepted Answer

len refers to the number of characters when applied to a unicode string (this is nuanced, other answers flush that out more), bytes in a encoded string, items in a list (or set, or keys in a dictionary)...

rawData = re.sub("[0-9]I(\+|-)","",rawData)

score 0 · Accepted Answer

0

Why don't you take a different approach. With replace method?

于 2011-07-01T15:58:36.580 に答える

python - Python: 正規表現と文字列の長さ (バイト単位)

3 に答える 3

Related

Reference