python - リストの要素が別のリストの要素に存在するかどうかを確認します

Question

リストに問題があります。だから、基本的に、私はリストを持っています：

a=["Britney spears", "red dog", "\xa2xe3"]

そして私は次のような別のリストを持っています：

b = ["cat","dog","red dog is stupid", "good stuff \xa2xe3", "awesome Britney spears"]

私がやりたいのはa、の要素が-の要素の一部であるかどうかを確認し、含まれている場合は、の要素bからそれらを削除することbです。だから、私は次のようbになりたいです：

b = ["cat","dog","is stupid","good stuff","awesome"]

これを達成するための最もpythonic（2.7.x）の方法は何ですか？

ループして各要素をチェックできると思いますが、これが非常に効率的かどうかはわかりませんb。サイズが約50kのリスト（）があります。

score 4 · Accepted Answer

ここでは正規表現を使用すると思います：

import re

a=["Britney spears", "red dog", "\xa2xe3"]

regex = re.compile('|'.join(re.escape(x) for x in a))

b=["cat","dog","red dog is stupid", "good stuff \xa2xe3", "awesome Britney spears"]

b = [regex.sub("",x) for x in b ]
print (b)  #['cat', 'dog', ' is stupid', 'good stuff ', 'awesome ']

このようにして、正規表現エンジンは選択肢のリストのテストを最適化できます。

さまざまなregexがどのように動作するかを示すための一連の選択肢を次に示します。

import re

a = ["Britney spears", "red dog", "\xa2xe3"]
b = ["cat","dog",
     "red dog is stupid", 
     "good stuff \xa2xe3", 
     "awesome Britney spears",
     "transferred dogcatcher"]

#This version leaves whitespace and will match between words.
regex = re.compile('|'.join(re.escape(x) for x in a))
c = [regex.sub("",x) for x in b ]
print (c) #['cat', 'dog', ' is stupid', 'good stuff ', 'awesome ', 'transfercatcher']

#This version strips whitespace from either end
# of the returned string
regex = re.compile('|'.join(r'\s*{}\s*'.format(re.escape(x)) for x in a))
c = [regex.sub("",x) for x in b ]
print (c) #['cat', 'dog', 'is stupid', 'good stuff', 'awesome', 'transfercatcher']

#This version will only match at word boundaries,
# but you lose the match with \xa2xe3 since it isn't a word
regex = re.compile('|'.join(r'\s*\b{}\b\s*'.format(re.escape(x)) for x in a))
c = [regex.sub("",x) for x in b ]
print (c) #['cat', 'dog', 'is stupid', 'good stuff \xa2xe3', 'awesome', 'transferred dogcatcher']


#This version finally seems to get it right.  It matches whitespace (or the start
# of the string) and then the "word" and then more whitespace (or the end of the 
# string).  It then replaces that match with nothing -- i.e. it removes the match 
# from the string.
regex = re.compile('|'.join(r'(?:\s+|^)'+re.escape(x)+r'(?:\s+|$)' for x in a))
c = [regex.sub("",x) for x in b ]
print (c) #['cat', 'dog', 'is stupid', 'good stuff', 'awesome', 'transferred dogcatcher']

score 2 · Accepted Answer

まあ、これがもうpythonicreduceとしてカウントされるかどうかはわかりません。python3で追放されfunctoolsたので、誰かがテーブルにワンライナーを置く必要があります。

a = ["Britney spears", "red dog", "\xa2xe3"]
b = ["cat","dog","red dog is stupid", "good stuff \xa2xe3", "awesome Britney spears"]

b = [reduce(lambda acc, n: acc.replace(n, ''), a, x).strip() for x in b]

さらに速くなります

[reduce(lambda acc, n: acc.replace(n, '') if n in acc else acc, a, x).strip() for x in b]

しかし、読みやすさが低下するにつれて、それは私が思うにpythonicが少なくなっています。

これがケースを処理するtransferred dogcatcherものです。mgilsonの正規表現を借りましたが、それは非常に些細なことなので大丈夫だと思います:-)：

def reducer(acc, n):
    if n in acc:
        return re.sub('(?:\s+|^)' + re.escape(n) + '(?:\s+|$)', '', acc)
    return acc

b = [reduce(reducer, a, x).strip() for x in b]

lambda読みやすくするために、を名前付き関数に抽出しました。

score 1 · Accepted Answer

aさて、最も単純なのはリスト内包表記であり、小さい限り、それはかなり効率的な方法ですらあります。

b = [i for i in b if i not in a]

python - リストの要素が別のリストの要素に存在するかどうかを確認します

3 に答える 3

Related

Reference