python - フロートのタプルのリストからac配列を構築するPythonの最速の方法は何ですか？

Question

コンテキスト：私のPythonコードは、2D頂点の配列をOpenGLに渡します。

私は2つのアプローチをテストしました。1つはctypesを使用し、もう1つはstructを使用し、後者は2倍以上高速です。

from random import random
points = [(random(), random()) for _ in xrange(1000)]

from ctypes import c_float
def array_ctypes(points):
    n = len(points)
    return n, (c_float*(2*n))(*[u for point in points for u in point])

from struct import pack
def array_struct(points):
    n = len(points)
    return n, pack("f"*2*n, *[u for point in points for u in point])

他の選択肢はありますか？そのようなコードを加速する方法についてのヒントはありますか（そして、はい、これは私のコードの1つのボトルネックです）？

score 3 · Accepted Answer

オーバーヘッドを発生させることなく、numpy配列をPyOpenGLに渡すことができます。（datanumpy配列の属性は、構築している配列と同じ情報を含む、基になるCデータ構造を指すバッファーです）

import numpy as np  
def array_numpy(points):
    n = len(points)
    return n, np.array(points, dtype=np.float32)

struct私のコンピューターでは、これはベースのアプローチよりも約40％高速です。

score 2 · Accepted Answer

Cythonを試すことができます。私にとって、これは次のようになります。

function       usec per loop:
               Python  Cython
array_ctypes   1370    1220
array_struct    384     249
array_numpy     336     339

したがって、Numpyは私のハードウェア（WindowsXPを実行している古いラップトップ）で15％のメリットしか得られませんが、Cythonは約35％（分散コードに余分な依存関係はありません）を提供します。

各ポイントがフロートのタプルであるという要件を緩和し、単に「ポイント」をフロートのフラット化されたリストにすることができる場合：

def array_struct_flat(points):
    n = len(points)
    return pack(
        "f"*n,
        *[
            coord
            for coord in points
        ]
    )

points = [random() for _ in xrange(1000 * 2)]

結果の出力は同じですが、タイミングはさらに遅くなります。

function            usec per loop:
                    Python  Cython
array_struct_flat           157

私より賢い人が静的型宣言をコードに追加したいのであれば、Cythonはこれよりも大幅に優れている可能性があります。（'cython -a test.pyx'を実行すると、このために非常に貴重です。純粋なC（白）に変換されたPythonに対して、コード内で最も遅い（黄色）プレーンPythonがどこにあるかを示すhtmlファイルが生成されます。上記のコードを非常に多くの行に広げています。これは、色付けが1行ごとに行われるため、そのように広げるのに役立ちます。）

Cythonの完全な手順はここにあります：http： //docs.cython.org/src/quickstart/build.html

Cythonは、コードベース全体で同様のパフォーマンス上の利点を生み出す可能性があり、理想的な条件では、適切な静的型付けを適用すると、速度を10倍または100倍向上させることができます。

score 1 · Accepted Answer

私が偶然見つけた別のアイデアがあります。今はプロファイリングする時間がありませんが、他の誰かがプロファイリングする場合に備えて：

 # untested, but I'm fairly confident it runs
 # using 'flattened points' list, i.e. a list of n*2 floats
 points = [random() for _ in xrange(1000 * 2)]
 c_array = c_float * len(points * 2)
 c_array[:] = points

つまり、最初にctypes配列を作成しますが、それを設定しません。次に、スライス表記を使用してデータを入力します。私より賢い人は、このようなスライスに割り当てるとパフォーマンスが向上する可能性があると言っています。これにより、 * iterable構文を使用せずに、割り当てのRHSでリストまたはiterableを直接渡すことができます。これにより、iterableの中間的なラングリングが実行されます。これが、ピグレットのバッチを作成する深さで起こっていることだと思います。

おそらく、c_arrayを1回作成してから、ポイントリストが変更されるたびにc_array（上記のコードの最後の行）に再割り当てすることができます。

ポイントの元の定義（（x、y）タプルのリスト）を受け入れる代替の定式化がおそらくあります。次のようなもの：

 # very untested, likely contains errors
 # using a list of n tuples of two floats
 points = [(random(), random()) for _ in xrange(1000)]
 c_array = c_float * len(points * 2)
 c_array[:] = chain(p for p in points)

score 1 · Accepted Answer

パフォーマンスが問題になる場合は、star操作でctypes配列を使用しないでください（例：）(ctypes.c_float * size)(*t)。

私のテストpackでは、最も速くarray、アドレスのキャストを含むモジュールを使用します（またはfrom_buffer関数を使用します）。

import timeit
repeat = 100
setup="from struct import pack; from random import random; import numpy;  from array import array; import ctypes; t = [random() for _ in range(2* 1000)];"
print(timeit.timeit(stmt="v = array('f',t); addr, count = v.buffer_info();x = ctypes.cast(addr,ctypes.POINTER(ctypes.c_float))",setup=setup,number=repeat))
print(timeit.timeit(stmt="v = array('f',t);a = (ctypes.c_float * len(v)).from_buffer(v)",setup=setup,number=repeat))
print(timeit.timeit(stmt='x = (ctypes.c_float * len(t))(*t)',setup=setup,number=repeat))
print(timeit.timeit(stmt="x = pack('f'*len(t), *t);",setup=setup,number=repeat))
print(timeit.timeit(stmt='x = (ctypes.c_float * len(t))(); x[:] = t',setup=setup,number=repeat))
print(timeit.timeit(stmt='x = numpy.array(t,numpy.float32).data',setup=setup,number=repeat))

array.arrayアプローチは、私のテストではJonathan Hartleyのアプローチよりもわずかに高速ですが、numpyアプローチの速度は約半分です。

python3 convert.py
0.004665990360081196
0.004661010578274727
0.026358536444604397
0.0028003649786114693
0.005843495950102806
0.009067213162779808

正味の勝者はパックです。

score 0 · Accepted Answer

配列を使用できます（リスト内包表記の代わりにジェネレータ式にも注意してください）。

array("f", (u for point in points for u in point)).tostring()

もう1つの最適化は、ポイントを最初からフラットに保つことです。

python - フロートのタプルのリストからac配列を構築するPythonの最速の方法は何ですか？

5 に答える 5

Related

Reference