python - 正規表現なしの複数のデリメータでのPython分割文字列

Question

正規表現を使用せずに複数の文字に分割する必要がある文字列があります。たとえば、次のようなものが必要になります。

>>>string="hello there[my]friend"
>>>string.split(' []')
['hello','there','my','friend']

このようなPythonに何かありますか？

score 8 · Accepted Answer

複数の区切り文字が必要な場合re.splitは、これが最適な方法です。

正規表現を使用しないと、カスタム関数を作成しない限り不可能です。

これがそのような関数です-それはあなたが望むことをするかもしれないし、しないかもしれません（連続した区切り文字は空の要素を引き起こします）：

>>> def multisplit(s, delims):
...     pos = 0
...     for i, c in enumerate(s):
...         if c in delims:
...             yield s[pos:i]
...             pos = i + 1
...     yield s[pos:]
...
>>> list(multisplit('hello there[my]friend', ' []'))
['hello', 'there', 'my', 'friend']

score 1 · Accepted Answer

正規表現なしのソリューション：

from itertools import groupby
sep = ' []'
s = 'hello there[my]friend'
print [''.join(g) for k, g in groupby(s, sep.__contains__) if not k]

ここに説明を投稿しましたhttps://stackoverflow.com/a/19211729/2468006

score 1 · Accepted Answer

正規表現を使用しない再帰的なソリューション。他の回答とは対照的に、ベースpythonのみを使用します。

def split_on_multiple_chars(string_to_split, set_of_chars_as_string):
    # Recursive splitting
    # Returns a list of strings

    s = string_to_split
    chars = set_of_chars_as_string

    # If no more characters to split on, return input
    if len(chars) == 0:
        return([s])

    # Split on the first of the delimiter characters
    ss = s.split(chars[0])

    # Recursive call without the first splitting character
    bb = []
    for e in ss:
        aa = split_on_multiple_chars(e, chars[1:])
        bb.extend(aa)
    return(bb)

pythons regularと非常によく似てstring.split(...)いますが、いくつかの区切り文字を受け入れます。

使用例：

print(split_on_multiple_chars('my"example_string.with:funny?delimiters', '_.:;'))

出力：

['my"example', 'string', 'with', 'funny?delimiters']

score -3 · Accepted Answer

re.splitここで適切なツールです。

>>> string="hello there[my]friend"
>>> import re
>>> re.split('[] []', string)
['hello', 'there', 'my', 'friend']

正規表現で[...]、文字クラスを定義します。角かっこ内のすべての文字が一致します。角かっこの間隔を空ける方法では、角かっこをエスケープする必要がありませんが、パターン[\[\] ]も機能します。

>>> re.split('[\[\] ]', string)
['hello', 'there', 'my', 'friend']

re.DEBUGre.compileのフラグも、パターンが一致するものを出力するので便利です。

>>> re.compile('[] []', re.DEBUG)
in 
  literal 93
  literal 32
  literal 91
<_sre.SRE_Pattern object at 0x16b0850>

（ここで、32、91、93は、、、に割り当てられたASCII値[です]）

python - 正規表現なしの複数のデリメータでのPython分割文字列

4 に答える 4

Related

Reference