python - このバイナリヒープの実装が Python の stdlib の実装よりも遅いのはなぜですか?

Question

ヒープデータ構造を理解するために、独自のヒープモジュールを実装しています。それらがどのように機能し、管理されているかは理解していますが、私の実装は、ヒープソートを実行している間、標準の python heapq モジュールよりも大幅に遅くなります (サイズ 100,000 のリストの場合、heapq は 0.6 秒かかりますが、私のコードは 2 秒かかります (元は 2.6 秒でした)。 percDown から len() ステートメントを取り出して長さを渡すことで 2 秒まで短縮できるため、メソッドがそれ自体を呼び出すたびに len を計算する必要がありません) これが私の実装です。

def percDown(lst, start, end, node):
    #Moves given node down the heap, starting at index start, until the heap property is
    #satisfied (all children must be larger than their parent)
    iChild = 2 * start + 1
    i = start
    # if the node has reached the end of the heap (i.e. no children left),
    # return its index (we are done)
    if iChild > end - 1:
        return start
    #if the second child exists and is smaller than the first child, use that child index
    #for comparing later
    if iChild + 1 < end and lst[iChild + 1] < lst[iChild]:
        iChild += 1
    #if the smallest child is less than the node, it is the new parent
    if lst[iChild] < node:
        #move the child to the parent position
        lst[start] = lst[iChild]
        #continue recursively going through the child nodes of the
        # new parent node to find where node is meant to go
        i = percDown(lst, iChild, end, node)
    return i

popMin: 最小値 (lst[0]) をポップし、ヒープを並べ替えます

def popMin(lst):
    length = len(lst)
    if (length > 1):
        min = lst[0]
        ele = lst.pop()
        i = percDown(lst, 0, length - 1, ele)
        lst[i] = ele
        return min
    else:
        return lst.pop()

heapify: リストをその場でヒープに変換します

def heapify(lst):
    iLastParent = math.floor((len(lst) - 1) / 2)
    length = len(lst)
    while iLastParent >= 0:
        ele = lst[iLastParent]
        i = percDown(lst, iLastParent, length, lst[iLastParent])
        lst[i] = ele
        iLastParent -= 1

sort: 上記のメソッドを使用して、指定されたリストをヒープソートします (インプレースではありません)。

def sort(lst):
    result = []
    heap.heapify(lst)
    length = len(lst)
    for z in range(0, length):
        result.append(heap.popMin(lst))
    return result

アルゴリズム/ヒープの作成に誤って複雑さを追加しましたか、それとも python heapq モジュールが大幅に最適化されているだけですか? 0.6 秒と 2 秒では大きな違いがあるので、前者だと思います。

score 6 · Accepted Answer

Pythonheapqモジュールは C 拡張を使用します。C コードに勝るものはありません。

heapqモジュールのソースコードから：

# If available, use C implementation
try:
    from _heapq import *
except ImportError:
    pass

_heapqmodule.cC ソースも参照してください。

score 2 · Accepted Answer

0.6 秒と 2.6 秒の差は 4 倍弱です。それは「大きすぎる」でしょうか？

それは答えるのに十分な情報ではありません。4 倍の違いは、アルゴリズムの誤りが原因である可能性があります。

たとえば、1000 で 1.2 倍、100000 で 4 倍、1000000 で 12 倍の違いしか得られない場合、アルゴリズムの複雑さはおそらく悪化しています。修正します。

一方、3 つのサイズすべてで約 4 倍の違いがある場合は、オーバーヘッドの定数乗数が大きくなります。Martijn Pieters' answer_heapqで説明されているように、(CPython) stdlib バージョンは C で同じループを実行するアクセラレータモジュールを使用しているのに対し、Python で実行されている内部ループがあるためです。だから、あなたは何も間違っていませんでした。おそらく少し最適化することはできますが、最終的には、コードのコアを C で書き直すか、JIT 最適化インタープリターで実行して、stdlib と同じくらい優れたものにする必要があります。実際、アルゴリズムを理解するためにこれを書いているだけなら、その必要はありません。

補足として、PyPy で比較を実行してみてください。その stdlib のほとんどは純粋な Python で書かれており、特別な最適化は行われていませんが、最適化された JIT コンパイラにより、CPython のネイティブ C コードとほぼ同じ速度になります。そして、その同じ JIT がコードに適用されます。つまり、最適化されていないコードは、多くの場合、CPython のネイティブ C コードとほぼ同じです。もちろん、その保証はありません。アルゴリズムの複雑さをテストしようとしている場合は、常にさまざまなサイズでテストする必要があるという事実は変わりません。

python - このバイナリ ヒープの実装が Python の stdlib の実装よりも遅いのはなぜですか?

2 に答える 2

Related

Reference

python - このバイナリヒープの実装が Python の stdlib の実装よりも遅いのはなぜですか?