python - Pythonを使用してアイテムのリストで奇妙なアイテムを識別する方法

Question

私の目標は、以下のリストで奇妙な要素を特定することです。

list_1=['taska1', 'taska2', 'taska3', 'taskb2', 'taska7']

奇数の項目はtasksb2、他の 4 つの項目の下にありtaskaます。

それらはすべて同じ長さであるため、len 関数を使用した識別は機能しません。何か案は？ありがとう。

score 0 · Accepted Answer

アイテムの基本的な構造がどうなるかを知っていれば、それは簡単です。

アイテムの構造がアプリオリにわからない場合、1 つのアプローチは、アイテム同士の類似性に従ってアイテムにスコアを付けることです。この質問からの情報を標準ライブラリモジュールdifflibに使用すると、

import difflib
import itertools

list_1=['taska1', 'taska2', 'taska3', 'taskb2', 'taska7']

# Initialize a dict, keyed on the items, with 0.0 score to start
score = dict.fromkeys(list_1, 0.0)

# Arrange the items in pairs with each other
for w1, w2 in itertools.combinations(list_1, 2):
    # Performs the matching function - see difflib docs
    seq=difflib.SequenceMatcher(a=w1, b=w2)
    # increment the "match" score for each
    score[w1]+=seq.ratio()
    score[w2]+=seq.ratio()

# Print the results

>>> score
{'taska1': 3.166666666666667,
 'taska2': 3.3333333333333335,
 'taska3': 3.166666666666667,
 'taska7': 3.1666666666666665,
 'taskb2': 2.833333333333333}

taskb2 のスコアが最も低いことがわかりました。

python - Pythonを使用してアイテムのリストで奇妙なアイテムを識別する方法

3 に答える 3

Related

Reference