python - 繰り返しを配列にグループ化しますか？

Question

1次元の並べ替えられた配列を取得し、2つの列を持つ2次元の配列を返す関数を探しています。最初の列には繰り返しのないアイテムが含まれ、2番目の列にはアイテムの繰り返しの数が含まれます。現在、私のコードは次のとおりです。

def priorsGrouper(priors):
    if priors.size==0:
        ret=priors;
    elif priors.size==1:
        ret=priors[0],1;
    else:
        ret=numpy.zeros((1,2));
        pointer1,pointer2=0,0;
        while(pointer1<priors.size):
            counter=0;
            while(pointer2<priors.size and priors[pointer2]==priors[pointer1]):
                counter+=1;
                pointer2+=1;
            ret=numpy.row_stack((ret,[priors[pointer1],pointer2-pointer1]))
            pointer1=pointer2;
    return ret;
print priorsGrouper(numpy.array([1,2,2,3]))

私の出力は次のとおりです。

[[ 0.  0.]
 [ 1.  1.]
 [ 2.  2.]
 [ 3.  1.]]

まず第一に、私は自分の[0,0]を取り除くことができません。次に、これにnumpyまたはscipy関数があるかどうか、または私のもので大丈夫かどうかを知りたいですか？

ありがとう。

score 5 · Accepted Answer

np.uniqueを使用して、の一意の値とx、インデックスの配列（と呼ばれるinverse）を取得できます。はinverse、の要素の「ラベル」と考えることができますx。それ自体とは異なりx、ラベルは常に0から始まる整数です。

次に、ラベルのビンカウントを取得できます。ラベルは0から始まるため、bincountは、気にしない多くのゼロで埋められることはありません。

最後に、column_stackが結合yし、bincountが2D配列になります。

In [84]: x = np.array([1,2,2,3])

In [85]: y, inverse = np.unique(x, return_inverse=True)

In [86]: y
Out[86]: array([1, 2, 3])

In [87]: inverse
Out[87]: array([0, 1, 1, 2])

In [88]: np.bincount(inverse)
Out[88]: array([1, 2, 1])

In [89]: np.column_stack((y,np.bincount(inverse)))
Out[89]: 
array([[1, 1],
       [2, 2],
       [3, 1]])

配列が小さい場合、プレーンなPythonメソッドを使用するとNumPy関数よりも高速であることが判明することがあります。xここでそれが当てはまるかどうか、もしそうなら、 NumPyメソッドが高速になる前にどれだけ大きくなければならないかを確認したかったのです。

これは、次のサイズの関数としてのさまざまなメソッドのパフォーマンスのグラフですx。ここに画像の説明を入力してください

In [173]: x = np.random.random(1000)

In [174]: x.sort()

In [156]: %timeit using_unique(x)
10000 loops, best of 3: 99.7 us per loop

In [180]: %timeit using_groupby(x)
100 loops, best of 3: 3.64 ms per loop

In [157]: %timeit using_counter(x)
100 loops, best of 3: 4.31 ms per loop

In [158]: %timeit using_ordered_dict(x)
100 loops, best of 3: 4.7 ms per loop

len(x)1000の場合、テストusing_uniqueされたプレーンなPythonメソッドのどれよりも35倍以上高速です。

using_uniqueしたがって、非常に小さい場合でも、最速のように見えlen(x)ます。

グラフの生成に使用されるプログラムは次のとおりです。

import numpy as np
import collections
import itertools as IT
import matplotlib.pyplot as plt
import timeit

def using_unique(x):
    y, inverse = np.unique(x, return_inverse=True)
    return np.column_stack((y, np.bincount(inverse)))

def using_counter(x):
    result = collections.Counter(x)
    return np.array(sorted(result.items()))

def using_ordered_dict(x):
    result = collections.OrderedDict()
    for item in x:
        result[item] = result.get(item,0)+1
    return np.array(result.items())

def using_groupby(x):
    return np.array([(k, sum(1 for i in g)) for k, g in IT.groupby(x)])

fig, ax = plt.subplots()
timing = collections.defaultdict(list)
Ns = [int(round(n)) for n in np.logspace(0, 3, 10)]
for n in Ns:
    x = np.random.random(n)
    x.sort()
    timing['unique'].append(
        timeit.timeit('m.using_unique(m.x)', 'import __main__ as m', number=1000))
    timing['counter'].append(
        timeit.timeit('m.using_counter(m.x)', 'import __main__ as m', number=1000))
    timing['ordered_dict'].append(
        timeit.timeit('m.using_ordered_dict(m.x)', 'import __main__ as m', number=1000))
    timing['groupby'].append(
        timeit.timeit('m.using_groupby(m.x)', 'import __main__ as m', number=1000))

ax.plot(Ns, timing['unique'], label='using_unique')
ax.plot(Ns, timing['counter'], label='using_counter')
ax.plot(Ns, timing['ordered_dict'], label='using_ordered_dict')
ax.plot(Ns, timing['groupby'], label='using_groupby')
plt.legend(loc='best')
plt.ylabel('milliseconds')
plt.xlabel('size of x')
plt.show()

score 3 · Accepted Answer

順序が重要でない場合は、Counterを使用してください。

from collections import Counter
% Counter([1,2,2,3])
= Counter({2: 2, 1: 1, 3: 1})
% Counter([1,2,2,3]).items()
[(1, 1), (2, 2), (3, 1)]

順序を（最初の外観で）保持するために、独自のバージョンのCounterを実装できます。

from collections import OrderedDict
def OrderedCounter(seq):
     res = OrderedDict()
     for x in seq:
        res.setdefault(x, 0) 
        res[x] += 1
     return res
% OrderedCounter([1,2,2,3])
= OrderedDict([(1, 1), (2, 2), (3, 1)])
% OrderedCounter([1,2,2,3]).items()
= [(1, 1), (2, 2), (3, 1)]

score 1 · Accepted Answer

アイテムの繰り返しをカウントしたい場合は、辞書を使用できます。

l = [1, 2, 2, 3]
d = {}
for i in l:
    if i not in d:
        d[i] = 1
    else:
        d[i] += 1
result = [[k, v] for k, v in d.items()]

あなたの例では、次のようになります。

[[1, 1],
 [2, 2], 
 [3, 1]]

幸運を。

score 0 · Accepted Answer

まず、ステートメントをセミコロン（;）で終了する必要はありません。これはCではありません。:-)

次に、5行目（およびその他）はに設定さretれていますvalue,valueが、これはリストではありません。

>type foo.py
def foo():
        return [1],2
a,b = foo()
print "a = {0}".format(a)
print "b = {0}".format(b)

与える：

>python foo.py
a = [1]
b = 2

第三に：これを行うためのより簡単な方法があります、ここに頭に浮かぶものがあります：

Setコンストラクターを使用して、アイテムの一意のリストを作成します
Setの各エントリが入力文字列で発生する回数のリストを作成します
zip（）を使用して、2つのリストを結合し、タプルのセットとして返します（ただし、これは正確にはあなたが求めていたものではありません）

これが1つの方法です：

def priorsGrouper(priors):
    """Find out how many times each element occurs in a list.

    @param[in] priors List of elements
    @return Two-dimensional list: first row is the unique elements,
                second row is the number of occurrences of each element.
    """

    # Generate a `list' containing only unique elements from the input
    mySet = set(priors)

    # Create the list that will store the number of occurrences
    occurrenceCounts = []

    # Count how many times each element occurs on the input:
    for element in mySet:
        occurrenceCounts.append(priors.count(element))

    # Combine the two:
    combinedArray = zip(mySet, occurrenceCounts)
# End of priorsGrouper() ----------------------------------------------

# Check zero-element case
print priorsGrouper([])

# Check multi-element case
sampleInput = ['a','a', 'b', 'c', 'c', 'c']
print priorsGrouper(sampleInput)

python - 繰り返しを配列にグループ化しますか？

4 に答える 4

Related

Reference