python - 冗長な出力なしで Python で文字列を分割する方法

Question

正規表現をセパレータとして使用して文字列を分割しようとしましたが、の出力にはstring.split冗長な結果が含まれているようです。

import re;
replaceArray = '((Replace the string)|((in|inside|within) the string)|(with the string))'
stringToSplit = '(Replace the string arr1 in the array arr2 with the array arr3)'
print(re.split(replaceArray, stringToSplit))

結果が重複することなく、分割文字列が次のようになると予想しました。

['Replace the string', ' arr1 ', 'in the string', ' arr2 ', 'with the string', ' arr3']

しかし代わりに、分割された文字列の配列には冗長な結果が含まれており、一致した他の文字列と重複しているように見えます。

['', 'Replace the string', 'Replace the string', None, None, None, ' arr1 ', 'in the string', None, 'in the string', 'in', None, ' arr2 ', 'with the string', None, None, None, 'with the string', ' arr3']

これらの冗長で重複する結果がの出力に含まれないようにする方法はありますstring.splitか?

score 2 · Accepted Answer

正規表現にキャプチャグループがある場合、の結果にre.split()はそれらのキャプチャグループが含まれます。すべてのグループの先頭に追加?:して、それらを非キャプチャにします。これらのグループのいくつかは実際には必要ありません。次のことを試してください。

replaceArray = 'Replace the string|(?:in|inside|within) the string|with the string'

score 1 · Accepted Answer

ドキュメントからre.split：_

pattern でキャプチャ用括弧が使用されている場合、パターン内のすべてのグループのテキストも結果のリストの一部として返されます。

正規表現で非キャプチャグループを使用したいと思います。つまり、を(...)使用する代わりに、(?:...)

score 1 · Accepted Answer

が先頭に付いている一致するグループ?:は非キャプチャグループであり、出力には表示されません。re.splitさらに、おそらくここでは使用したくないでしょうが、re.match代わりに、文字列を分割することにあまり関心がなく、その代わりにそれらのグループを抽出したいと考えています。

>>> expr = '\((Replace the array (.*?)) ((?:in|inside|within) the array (.*?)) (with the array (.*?))\)'
>>> re.match(expr, stringToSplit).groups()
('Replace the array arr1', 'arr1', 'in the array arr2', 'arr2', 'with the array arr3', 'arr3')

または

>>> expr = '\((Replace the array) (.*?) ((?:in|inside|within) the array) (.*?) (with the array) (.*?)\)'
>>> re.match(expr, stringToSplit).groups()
('Replace the array', 'arr1', 'in the array', 'arr2', 'with the array', 'arr3')

python - 冗長な出力なしで Python で文字列を分割する方法

3 に答える 3

Related

Reference