python - Pythonで印刷されていないASCII文字で行を分割する方法

Question

非印刷ASCII文字(長いマイナス記号hex 0x97、Octal 227など)でPythonで行を分割するにはどうすればよいですか? キャラ自体いらない。それ以降の情報は変数として保存されます。

score 5 · Accepted Answer

を使用できますre.split。

>>> import re
>>> re.split('\W+', 'Words, words, words.')
['Words', 'words', 'words', '']

残しておきたい文字だけを含めるようにパターンを調整します。

参照：striping-non-printable-characters-from-a-string-in-python

例（長いマイナス付き）：

>>> # \xe2\x80\x93 represents a long dash (or long minus)
>>> s = 'hello – world'
>>> s
'hello \xe2\x80\x93 world'
>>> import re
>>> re.split("\xe2\x80\x93", s)
['hello ', ' world']

または、Unicodeでも同じです。

>>> # \u2013 represents a long dash, long minus or so called en-dash
>>> s = u'hello – world'
>>> s
u'hello \u2013 world'
>>> import re
>>> re.split(u"\u2013", s)
[u'hello ', u' world']

score 2 · Accepted Answer

_, _, your_result= your_input_string.partition('\x97')

また

your_result= your_input_string.partition('\x97')[2]

your_input_stringが含まれていない場合は'\x97'、your_result空になります。複数の文字がyour_input_string含まれている場合、他の文字を含め、最初の文字の後のすべてが含まれます。 '\x97'your_result'\x97''\x97'

score 1 · Accepted Answer

文字列/ユニコード分割メソッドを使用するだけです（分割する文字列についてはあまり気にしません（定数である場合を除きます。正規表現を使用する場合は re.split を使用します）

分割文字列を取得するには、他の人が "\x97" を表示したようにエスケープします

また

文字列 (0-255) には chr(0x97) を、Unicode には unichr(0x97) を使用します

例は次のようになります

'will not be split'.split(chr(0x97))

'will be split here:\x97 and this is the second string'.split(chr(0x97))

python - Pythonで印刷されていないASCII文字で行を分割する方法

3 に答える 3

Related

Reference