python - Compare only object references in numpy

Question

I have a numpy array of Python object. I want to compare the array against a python object and I don't want the comparison with == operator, but just a reference comparison is enough for my requirements.

import numpy as np
a = np.array(["abc", "def"], dtype="object")
a == "abc"

I am sure about my array that reference copy is enough. Let's say all strings, I have in my array are interned.

This is primarily to improve the performance when comparing zillion values. Python object comparisons are really slow.

a is "abc" won't do what I want because

In [1]: import numpy as np

In [2]: a = np.array(["abc", "def"], dtype="object")

In [3]: a == "abc"
Out[3]: array([ True, False], dtype=bool)

In [4]: a is "abc"
Out[4]: False

I want the result of a == "abc" but I don't Python's __eq__ method be used for the same but just the is operator.

score 3 · Accepted Answer

私の要件には参照比較で十分です

オブジェクトの同一性を比較するには、is代わりに==次を使用します。

if a is b:
   ...

ドキュメントから：

オブジェクト IDの演算子isとテスト: は、とが同じオブジェクトである場合にのみ真になります。逆の真理値が得られます。is notx is yxyx is not y

編集:配列のすべての要素に適用isするには、次を使用できます。

In [6]: map(lambda x:x is "abc", a)
Out[6]: [True, False]

または単に：

In [9]: [x is "abc" for x in a]
Out[9]: [True, False]

score 0 · Accepted Answer

np.vectorize ではどうですか:

vector_is = np.vectorize(lambda x, y: x is y, otypes=[bool])

次に、あなたは持っています

>>> a = np.array(["abc", "def"], dtype="object")

>>> vector_is(a, "abc")
array([ True, False], dtype=bool)

operator.is_残念ながら、ここで使用できるかどうかはわかりません。

ValueError: failed to determine the number of arguments for <built-in function is_>

これは、リスト内包表記よりも少し遅いように見えますが (おそらくlambda呼び出しのため)、受け取る引数に関してもう少し柔軟であるという利点があります。

python -mtimeit -s 'import numpy as np' -s 'import random, string' -s 'a = np.array(["".join(random.choice(string.ascii_lowercase) for x in range(4)) for e in range(100000)])' -s 'vector_is = np.vectorize(lambda x,y: x is y, otypes=[bool])' 'vector_is(a, "abcd")'
10 loops, best of 3: 28.3 msec per loop

python -mtimeit -s 'import numpy as np' -s 'import random, string' -s 'a = np.array(["".join(random.choice(string.ascii_lowercase) for x in range(4)) for e in range(100000)])' -s 'vector_is = np.vectorize(lambda x,y: x is y, otypes=[bool])' '[x is "abcd" for x in a]'
100 loops, best of 3: 20 msec per loop

python -mtimeit -s 'import numpy as np' -s 'import random, string' -s 'a = np.array(["".join(random.choice(string.ascii_lowercase) for x in range(4)) for e in range(100000)])' -s 'vector_is = np.vectorize(lambda x,y: x is y, otypes=[bool])' 'np.fromiter((x is "abcd" for x in a), bool, len(a))'
10 loops, best of 3: 23.8 msec per loop

最後のアプローチnp.fromiter((x is "abcd" for x in a), bool, len(a))は、リスト内包表記アプローチから numpy 配列を取得する 1 つの方法です。

残念ながら、すべては単に使用するよりもはるかに遅くなります==:

python -mtimeit -s 'import numpy as np' -s 'import random, string' -s 'a = np.array(["".join(random.choice(string.ascii_lowercase) for x in range(4)) for e in range(100000)])' -s 'vector_is = np.vectorize(lambda x,y: x is y, otypes=[bool])' 'a == "abcd"'                                        
1000 loops, best of 3: 1.42 msec per loop

python - Compare only object references in numpy

2 に答える 2

Related

Reference