python - （）と[]の間のテキストを削除します

Question

非常に長いテキスト文字列が含まれ()て[]います。かっこと括弧の間の文字を削除しようとしていますが、その方法がわかりません。

リストは次のようになります。

x = "This is a sentence. (once a day) [twice a day]"

このリストは私が扱っているものではありませんが、非常に似ており、はるかに短いものです。

score 121 · Accepted Answer

re.sub関数を使用できます。

>>> import re 
>>> x = "This is a sentence. (once a day) [twice a day]"
>>> re.sub("([\(\[]).*?([\)\]])", "\g<1>\g<2>", x)
'This is a sentence. () []'

[]と（）を削除する場合は、次のコードを使用できます。

>>> import re 
>>> x = "This is a sentence. (once a day) [twice a day]"
>>> re.sub("[\(\[].*?[\)\]]", "", x)
'This is a sentence.  '

重要：このコードはネストされたシンボルでは機能しません

説明

最初の正規表現グループ(または[グループ1（括弧で囲むことにより）および)/または]グループ2に、これらのグループとその間にあるすべての文字を一致させます。照合後、照合された部分はグループ1と2に置き換えられ、最後の文字列は角かっこ内に何も残されません。2番目の正規表現は、これから自明です->すべてに一致し、空の文字列に置き換えます。

--AjayThomasによるコメントから変更

score 22 · Accepted Answer

このスクリプトを実行すると、ネストされた角かっこでも機能します。
基本的な論理テストを使用します。

def a(test_str):
    ret = ''
    skip1c = 0
    skip2c = 0
    for i in test_str:
        if i == '[':
            skip1c += 1
        elif i == '(':
            skip2c += 1
        elif i == ']' and skip1c > 0:
            skip1c -= 1
        elif i == ')'and skip2c > 0:
            skip2c -= 1
        elif skip1c == 0 and skip2c == 0:
            ret += i
    return ret

x = "ewq[a [(b] ([c))]] This is a sentence. (once a day) [twice a day]"
x = a(x)
print x
print repr(x)

実行しない場合に備えて
、出力は次のとおりです。

>>> 
ewq This is a sentence.  
'ewq This is a sentence.  '

score 16 · Accepted Answer

@pradyunsgの答えに似た解決策は次のとおりです（任意のネストされた角かっこで機能します）。

def remove_text_inside_brackets(text, brackets="()[]"):
    count = [0] * (len(brackets) // 2) # count open/close brackets
    saved_chars = []
    for character in text:
        for i, b in enumerate(brackets):
            if character == b: # found bracket
                kind, is_close = divmod(i, 2)
                count[kind] += (-1)**is_close # `+1`: open, `-1`: close
                if count[kind] < 0: # unbalanced bracket
                    count[kind] = 0  # keep it
                else:  # found bracket to remove
                    break
        else: # character is not a [balanced] bracket
            if not any(count): # outside brackets
                saved_chars.append(character)
    return ''.join(saved_chars)

print(repr(remove_text_inside_brackets(
    "This is a sentence. (once a day) [twice a day]")))
# -> 'This is a sentence.  '

score 14 · Accepted Answer

これは括弧で機能するはずです。正規表現は、一致したテキストを「消費」するため、ネストされた括弧では機能しません。

import re
regex = re.compile(".*?\((.*?)\)")
result = re.findall(regex, mystring)

または、これは1セットの括弧を検索し、ループしてさらに検索します。

start = mystring.find("(")
end = mystring.find(")")
if start != -1 and end != -1:
  result = mystring[start+1:end]

score 2 · Accepted Answer

文字列を分割、フィルタリング、および結合し直すことができます。角かっこが明確に定義されている場合は、次のコードで実行できます。

import re
x = "".join(re.split("\(|\)|\[|\]", x)[::2])

python - （）と[]の間のテキストを削除します

5 に答える 5

説明

Related

Reference