python - リストのリストを循環し、チェックして削除しますか?

Question

次のリストのリストを使用する (1 つの大きなリスト内に 4 つの個別のリスト)

myvariable = [['test', 'xxxx', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test'], 
              ['test', 'xxxx', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test'], 
              ['test', 'X1', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test']
             ]

各リストを循環し、要素 0 と要素 1 が他の要素のいずれかで同じかどうかを確認する必要があります。両方が一致する場合は、後者のリストを削除する必要があります (したがって、私の例では、中央のリストを削除します。

リストからアイテムを削除するたびに、リストを更新する必要があります。

誰にもアイデアはありますか？

score 4 · Accepted Answer

最初の 2 つの項目をキーとして dict を使用します。

>>> lis = [['test', 'xxxx', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test'], ['test', 'xxxx', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test'], ['test', 'X1', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test']]
>>> from collections import OrderedDict
>>> dic = OrderedDict()
>>> for item in lis:
...     key = tuple(item[:2])
...     if key not in dic:
...         dic[key] = item
...         
>>> dic.values()
[
 ['test', 'xxxx', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test'],
 ['test', 'X1', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test']
]

score 2 · Accepted Answer

リスト内包表記とセットを使用して、何が表示されたかを追跡します。

myvariable = [['test', 'xxxx', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test'], 
              ['test', 'xxxx', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test'], 
              ['test', 'X1', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test']
             ]

seen=set()
print [li for li in myvariable 
         if tuple(li[:2]) not in seen and not seen.add(tuple(li[:2]))]

版画:

[['test', 'xxxx', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test'], 
 ['test', 'X1', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test']]

リスト内包表記は順番に行われるため、順序は維持され、後者の重複は削除されます。

>>> lis=[[1,2,1],
...      [3,4,1],
...      [1,2,2],
...      [3,4,2]]
>>> seen=set()
>>> [li for li in lis if tuple(li[:2]) not in seen and not seen.add(tuple(li[:2]))]
[[1, 2, 1], [3, 4, 1]]

無視しないでください、これはそれを行うためのはるかに高速な方法です：

from collections import OrderedDict  

lis=[[1,2,1],
     [3,4,1],
     [1,2,2],
     [3,4,2]]

def f1(lis):
    seen=set()
    return [li for li in lis 
             if tuple(li[:2]) not in seen and not seen.add(tuple(li[:2]))]       

def f2(lis):
    dic = OrderedDict()
    for item in lis:
        key = tuple(item[:2])
        if key not in dic:
            dic[key] = item

    return dic.values()

if __name__ == '__main__':
    import timeit            
    print 'f1, LC+set:',timeit.timeit("f1(lis)", setup="from __main__ import f1,lis"),'secs'
    print 'f2, OrderedDic:',timeit.timeit("f2(lis)", setup="from __main__ import f2,lis,OrderedDict"),'secs'

版画:

f1, LC+set: 2.81167197227 secs
f2, OrderedDic: 16.4299631119 secs

したがって、このアプローチはほぼ6倍高速です

score 1 · Accepted Answer

このリスト内包表記は順序を保持し、最初の重複をすべて排除します。

>>> check = [L[0:2] for L in myvariable]
>>> [el for i, el in enumerate(myvariable) if el[0:2] not in check[:i]]
[['test', 'xxxx', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test'], 
['test', 'X1', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test']]

これは、より大きなリストに対してより適切に機能するリスト内包表記と標準の dict ソリューションです。

>>> d={}
>>> [d.setdefault(tuple(el[:2]), el) for el in myvar if tuple(el[:2]) not in d]
[['test', 'xxxx', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test'], 
['test', 'X1', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test']]

python - リストのリストを循環し、チェックして削除しますか?

3 に答える 3

Related

Reference