python - ソートされた配列内の重複要素を削除するループレスプログラム

Question

Pythonでソートされた配列の重複要素を削除する（そして最も効率的に）ループレスプログラムを（おそらく理解を使用して）書きたいと思います。

score 5 · Accepted Answer

私は個人的にこれを使用します。

>>> testList = [1, 1, 1, 2, 3, 3, 4, 4, 5, 6, 7, 8, 8, 9]
>>> sorted(set(testList))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

リストを最初からソートすることもできます。

>>> from random import shuffle
>>> shuffle(testList)
>>> testList
[1, 4, 5, 6, 2, 1, 3, 3, 4, 9, 8, 1, 7, 8]
>>> sorted(set(testList))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

score 4 · Accepted Answer

リストはソートされているため、つまりすべての重複がすでにグループ化されているため、次を使用できますitertools.groupby

>>> testList = [1, 1, 1, 2, 3, 3, 4, 4, 5, 6, 7, 8, 8, 9]
>>> from itertools import groupby
>>> [k for k, g in groupby(testList)]
[1, 2, 3, 4, 5, 6, 7, 8, 9]

これは、セットに変換してソートするよりも (メモリと時間の点で) 効率的です。また、等しいかどうかを比較するだけでよいという利点もあるため、ハッシュできないアイテムでも問題なく機能します。

score 1 · Accepted Answer

既存の順序を利用するには、を使用する必要がありますitertools.groupby。key引数がない場合、itertools.groupby引数 iterable 内の等しい要素の実行をグループ化します。

import itertools

newlist = [key for key, group in itertools.groupby(oldlist)]

これは O(n) でsorted(set(oldlist))実行されますが、O(nlog(n)) で実行されます。

score 1 · Accepted Answer

この記事によると、順序を維持せずにリストを一意化する最も速い方法は次のとおりです。

def f9(seq):
    # Not order preserving
    return {}.fromkeys(seq).keys()

ここでベンチマークスクリプトを表示できます: http://www.peterbe.com/plog/uniqifiers-benchmark/uniqifiers_benchmark.py

score 0 · Accepted Answer

使用するnumpy

testList = [1, 1, 1, 2, 3, 3, 4, 4, 5, 6, 7, 8, 8, 9]

import numpy
print numpy.unique(testList)

python - ソートされた配列内の重複要素を削除するループレス プログラム

5 に答える 5

Related

Reference

python - ソートされた配列内の重複要素を削除するループレスプログラム