python - 次のn回を呼び出すよりもpython itertoolsの「消費」レシピが速いのはなぜですか?

Question

itertools の python ドキュメントでは、イテレータを n ステップ進めるための次の「レシピ」を提供しています。

def consume(iterator, n):
    "Advance the iterator n-steps ahead. If n is none, consume entirely."
    # Use functions that consume iterators at C speed.
    if n is None:
        # feed the entire iterator into a zero-length deque
        collections.deque(iterator, maxlen=0)
    else:
        # advance to the empty slice starting at position n
        next(islice(iterator, n, n), None)

なぜこのレシピがこのようなものと根本的に異なるのか疑問に思っています (イテレータ全体を消費する処理は別として):

def other_consume(iterable, n):
    for i in xrange(n):
        next(iterable, None)

私はtimeit、予想通り、上記のアプローチがはるかに遅いことを確認していました。この優れたパフォーマンスを可能にするレシピで何が起こっているのでしょうか? を使用していることがわかりますisliceが、を見るとislice、上記のコードと基本的に同じことをしているように見えます。

def islice(iterable, *args):
    s = slice(*args)
    it = iter(xrange(s.start or 0, s.stop or sys.maxint, s.step or 1))
    nexti = next(it)
    ### it seems as if this loop yields from the iterable n times via enumerate
    ### how is this different from calling next n times?
    for i, element in enumerate(iterable): 
        if i == nexti:
            yield element
            nexti = next(it)

注:isliceからインポートする代わりにitertools、上記のドキュメントの同等の Python を使用して定義しても、レシピはまだ高速です..

編集：timeitここにコード：

timeit.timeit('a = iter([random() for i in xrange(1000000)]); consume(a, 1000000)', setup="from __main__ import consume,random", number=10)
timeit.timeit('a = iter([random() for i in xrange(1000000)]); other_consume(a, 1000000)', setup="from __main__ import other_consume,random", number=10)

other_consumeこれを実行するたびに〜2.5倍遅くなります

score 6 · Accepted Answer

レシピが高速である理由は、その主要部分 ( islice、deque) が純粋な Python ではなく C で実装されているためです。その一部は、C ループがよりも高速であることですfor i in xrange(n)。もう 1 つの部分は、Python の関数呼び出し (例: next()) は C の同等のものよりも高価であることです。

ドキュメントからコピーしたバージョンはitertools.islice正しくありません。明らかに優れたパフォーマンスは、それを使用する消費関数が何も消費しないためです。(そのため、そのバージョンのテスト結果は以下に示していませんが、かなり高速でした! :)

ここにいくつかの異なる実装があるので、何が最速かをテストできます。

import collections
from itertools import islice

# this is the official recipe
def consume_itertools(iterator, n):
    "Advance the iterator n-steps ahead. If n is none, consume entirely."
    # Use functions that consume iterators at C speed.
    if n is None:
        # feed the entire iterator into a zero-length deque
        collections.deque(iterator, maxlen=0)
    else:
        # advance to the empty slice starting at position n
        next(islice(iterator, n, n), None)

# your initial version, using a for loop on a range
def consume_qwwqwwq(iterator, n):
    for i in xrange(n):
        next(iterator, None)

# a slightly better version, that only has a single loop:
def consume_blckknght(iterator, n):
    if n <= 0:
        return
    for i, v in enumerate(iterator, start=1):
        if i == n:
            break

私のシステムでのタイミング (Windows 7 の Python 2.7.3 64 ビット):

>>> test = 'consume(iter(xrange(100000)), 1000)'
>>> timeit.timeit(test, 'from consume import consume_itertools as consume')
7.623556181657534
>>> timeit.timeit(test, 'from consume import consume_qwwqwwq as consume')
106.8907442334584
>>> timeit.timeit(test, 'from consume import consume_blckknght as consume')
56.81081856366518

私の評価では、ほぼ空の Python ループの実行には、C の同等のループよりも 7 倍から 8 倍の時間がかかります。一度に 2 つのシーケンスをループすると (のループに加えてconsume_qwwqwwqnext を呼び出すことで行うように)、コストは約 2 倍になります。 .iteratorforxrange

python - 次のn回を呼び出すよりもpython itertoolsの「消費」レシピが速いのはなぜですか?

2 に答える 2

Related

Reference