python - リストをチャンクで反復処理する最も「pythonic」な方法は何ですか?

Question

一度に 4 つの整数を操作する必要がある整数のリストを入力として受け取る Python スクリプトがあります。残念ながら、入力を制御することはできません。または、入力を 4 要素のタプルのリストとして渡す必要があります。現在、私はこのように繰り返しています：

for i in range(0, len(ints), 4):
    # dummy op for example code
    foo += ints[i] * ints[i + 1] + ints[i + 2] * ints[i + 3]

ただし、「C-think」によく似ているため、この状況に対処するためのよりPythonicな方法があると思われます。リストは反復後に破棄されるため、保持する必要はありません。おそらく、このようなものが良いでしょうか？

while ints:
    foo += ints[0] * ints[1] + ints[2] * ints[3]
    ints[0:4] = []

ただし、まだ完全に「感じ」ません。:-/

関連する質問: Python でリストを均等なサイズのチャンクに分割するにはどうすればよいですか?

score 521 · Accepted Answer

def chunker(seq, size):
    return (seq[pos:pos + size] for pos in range(0, len(seq), size))
# (in python 2 use xrange() instead of range() to avoid allocating a list)

任意のシーケンスで動作します：

text = "I am a very, very helpful text"

for group in chunker(text, 7):
   print(repr(group),)
# 'I am a ' 'very, v' 'ery hel' 'pful te' 'xt'

print '|'.join(chunker(text, 10))
# I am a ver|y, very he|lpful text

animals = ['cat', 'dog', 'rabbit', 'duck', 'bird', 'cow', 'gnu', 'fish']

for group in chunker(animals, 3):
    print(group)
# ['cat', 'dog', 'rabbit']
# ['duck', 'bird', 'cow']
# ['gnu', 'fish']

score 401 · Accepted Answer

Python のドキュメントのレシピセクションから変更:itertools

from itertools import zip_longest

def grouper(iterable, n, fillvalue=None):
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

例

grouper('ABCDEFG', 3, 'x')  # --> 'ABC' 'DEF' 'Gxx'

注: Python 2izip_longestでは、代わりにzip_longest.

score 189 · Accepted Answer

chunk_size = 4
for i in range(0, len(ints), chunk_size):
    chunk = ints[i:i+chunk_size]
    # process chunk of size <= chunk_size

score 28 · Accepted Answer

import itertools
def chunks(iterable,size):
    it = iter(iterable)
    chunk = tuple(itertools.islice(it,size))
    while chunk:
        yield chunk
        chunk = tuple(itertools.islice(it,size))

# though this will throw ValueError if the length of ints
# isn't a multiple of four:
for x1,x2,x3,x4 in chunks(ints,4):
    foo += x1 + x2 + x3 + x4

for chunk in chunks(ints,4):
    foo += sum(chunk)

別の方法：

import itertools
def chunks2(iterable,size,filler=None):
    it = itertools.chain(iterable,itertools.repeat(filler,size-1))
    chunk = tuple(itertools.islice(it,size))
    while len(chunk) == size:
        yield chunk
        chunk = tuple(itertools.islice(it,size))

# x2, x3 and x4 could get the value 0 if the length is not
# a multiple of 4.
for x1,x2,x3,x4 in chunks2(ints,4,0):
    foo += x1 + x2 + x3 + x4

score 16 · Accepted Answer

セットとジェネレーターでも機能するソリューションが必要でした。非常に短くてきれいなものは思いつきませんでしたが、少なくともかなり読みやすいものです。

def chunker(seq, size):
    res = []
    for el in seq:
        res.append(el)
        if len(res) == size:
            yield res
            res = []
    if res:
        yield res

リスト：

>>> list(chunker([i for i in range(10)], 3))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]

設定：

>>> list(chunker(set([i for i in range(10)]), 3))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]

発生器：

>>> list(chunker((i for i in range(10)), 3))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]

score 15 · Accepted Answer

more-itertoolsパッケージには、まさにそれを行うチャンクメソッドがあります。

import more_itertools
for s in more_itertools.chunked(range(9), 4):
    print(s)

版画

[0, 1, 2, 3]
[4, 5, 6, 7]
[8]

chunkedリスト内のアイテムを返します。iterable を使いたい場合は、ichankedを使用してください。

score 14 · Accepted Answer

Python 3.8 では、セイウチ演算子とitertools.islice.

from itertools import islice

list_ = [i for i in range(10, 100)]

def chunker(it, size):
    iterator = iter(it)
    while chunk := list(islice(iterator, size)):
        print(chunk)

In [2]: chunker(list_, 10)                                                         
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39]
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49]
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59]
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69]
[70, 71, 72, 73, 74, 75, 76, 77, 78, 79]
[80, 81, 82, 83, 84, 85, 86, 87, 88, 89]
[90, 91, 92, 93, 94, 95, 96, 97, 98, 99]

score 14 · Accepted Answer

この問題の理想的な解決策は、(シーケンスだけでなく) イテレータで機能します。また、高速である必要があります。

これは、 itertools のドキュメントによって提供されるソリューションです。

def grouper(n, iterable, fillvalue=None):
    #"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return itertools.izip_longest(fillvalue=fillvalue, *args)

Mac Book Air でipython を使用する%timeitと、ループごとに 47.5 us が得られます。

ただし、結果は同じサイズのグループになるようにパディングされるため、これは実際には機能しません。パディングなしのソリューションは、もう少し複雑です。最も素朴な解決策は次のとおりです。

def grouper(size, iterable):
    i = iter(iterable)
    while True:
        out = []
        try:
            for _ in range(size):
                out.append(i.next())
        except StopIteration:
            yield out
            break
        
        yield out

シンプルだがかなり遅い: ループあたり 693 us

私が思いつくことができる最善の解決策はislice、内側のループを使用することです。

def grouper(size, iterable):
    it = iter(iterable)
    while True:
        group = tuple(itertools.islice(it, None, size))
        if not group:
            break
        yield group

同じデータセットを使用すると、ループごとに 305 us になります。

それよりも速く純粋な解を得ることができないので、重要な警告とともに次の解を提供します。入力データにのインスタンスが含まfilldataれていると、間違った答えが得られる可能性があります。

def grouper(n, iterable, fillvalue=None):
    #"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    # itertools.zip_longest on Python 3
    for x in itertools.izip_longest(*args, fillvalue=fillvalue):
        if x[-1] is fillvalue:
            yield tuple(v for v in x if v is not fillvalue)
        else:
            yield x

私はこの答えが本当に好きではありませんが、かなり高速です。ループあたり 124 us

score 12 · Accepted Answer

from itertools import izip_longest

def chunker(iterable, chunksize, filler):
    return izip_longest(*[iter(iterable)]*chunksize, fillvalue=filler)

score 10 · Accepted Answer

他の提案と似ていますが、まったく同じではありませんが、シンプルで読みやすいので、私はこのようにするのが好きです:

it = iter([1, 2, 3, 4, 5, 6, 7, 8, 9])
for chunk in zip(it, it, it, it):
    print chunk

>>> (1, 2, 3, 4)
>>> (5, 6, 7, 8)

この方法では、最後の部分的なチャンクを取得できません。(9, None, None, None)最後のチャンクとして取得したい場合は、 izip_longestfrom を使用してitertoolsください。

score 8 · Accepted Answer

誰も言及していないので、まだここにzip()解決策があります：

>>> def chunker(iterable, chunksize):
...     return zip(*[iter(iterable)]*chunksize)

シーケンスの長さが常にチャンクサイズで割り切れる場合、またはそうでない場合は末尾のチャンクを気にしない場合にのみ機能します。

例：

>>> s = '1234567890'
>>> chunker(s, 3)
[('1', '2', '3'), ('4', '5', '6'), ('7', '8', '9')]
>>> chunker(s, 4)
[('1', '2', '3', '4'), ('5', '6', '7', '8')]
>>> chunker(s, 5)
[('1', '2', '3', '4', '5'), ('6', '7', '8', '9', '0')]

またはitertools.izipを使用して、リストの代わりに反復子を返します。

>>> from itertools import izip
>>> def chunker(iterable, chunksize):
...     return izip(*[iter(iterable)]*chunksize)

パディングは、@ΤΖΩΤΖΙΟΥ の回答を使用して修正できます。

>>> from itertools import chain, izip, repeat
>>> def chunker(iterable, chunksize, fillvalue=None):
...     it   = chain(iterable, repeat(fillvalue, chunksize-1))
...     args = [it] * chunksize
...     return izip(*args)

score 6 · Accepted Answer

別のアプローチは、次の 2 引数形式を使用することですiter。

from itertools import islice

def group(it, size):
    it = iter(it)
    return iter(lambda: tuple(islice(it, size)), ())

これは、パディングを使用するように簡単に適応できます (これはMarkus Jarderotの回答に似ています)。

from itertools import islice, chain, repeat

def group_pad(it, size, pad=None):
    it = chain(iter(it), repeat(pad))
    return iter(lambda: tuple(islice(it, size)), (pad,) * size)

オプションのパディングのためにこれらを組み合わせることもできます。

_no_pad = object()
def group(it, size, pad=_no_pad):
    if pad == _no_pad:
        it = iter(it)
        sentinel = ()
    else:
        it = chain(iter(it), repeat(pad))
        sentinel = (pad,) * size
    return iter(lambda: tuple(islice(it, size)), sentinel)

score 5 · Accepted Answer

zip() の代わりに map() を使用すると、JF Sebastian の回答のパディングの問題が修正されます。

>>> def chunker(iterable, chunksize):
...   return map(None,*[iter(iterable)]*chunksize)

例：

>>> s = '1234567890'
>>> chunker(s, 3)
[('1', '2', '3'), ('4', '5', '6'), ('7', '8', '9'), ('0', None, None)]
>>> chunker(s, 4)
[('1', '2', '3', '4'), ('5', '6', '7', '8'), ('9', '0', None, None)]
>>> chunker(s, 5)
[('1', '2', '3', '4', '5'), ('6', '7', '8', '9', '0')]

score 4 · Accepted Answer

ちょっとした機能やものを使うことは、私にとって本当に魅力的ではありません。私はスライスを使用することを好みます：

data = [...]
chunk_size = 10000 # or whatever
chunks = [data[i:i+chunk_size] for i in xrange(0,len(data),chunk_size)]
for chunk in chunks:
    ...

score 4 · Accepted Answer

リストが大きい場合、これを行う最もパフォーマンスの高い方法は、ジェネレーターを使用することです。

def get_chunk(iterable, chunk_size):
    result = []
    for item in iterable:
        result.append(item)
        if len(result) == chunk_size:
            yield tuple(result)
            result = []
    if len(result) > 0:
        yield tuple(result)

for x in get_chunk([1,2,3,4,5,6,7,8,9,10], 3):
    print x

(1, 2, 3)
(4, 5, 6)
(7, 8, 9)
(10,)

score 3 · Accepted Answer

リストへのすべての変換を回避するには、次のようにしますimport itertools。

>>> for k, g in itertools.groupby(xrange(35), lambda x: x/10):
...     list(g)

プロデュース:

... 
0 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
2 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
3 [30, 31, 32, 33, 34]
>>>

チェックgroupbyしたところ、リストまたは使用に変換されないため、len実際に使用されるまで各値の解決が遅れると思います。悲しいことに、(現時点で) 利用可能な回答のどれも、このバリエーションを提供していないようです。

明らかに、各アイテムを順番に処理する必要がある場合は、for ループを g にネストします。

for k,g in itertools.groupby(xrange(35), lambda x: x/10):
    for i in g:
       # do what you need to do with individual items
    # now do what you need to do with the whole group

これに対する私の具体的な関心は、ジェネレーターを使用して、最大 1000 のバッチで変更を gmail API に送信する必要があることでした。

    messages = a_generator_which_would_not_be_smart_as_a_list
    for idx, batch in groupby(messages, lambda x: x/1000):
        batch_request = BatchHttpRequest()
        for message in batch:
            batch_request.add(self.service.users().messages().modify(userId='me', id=message['id'], body=msg_labels))
        http = httplib2.Http()
        self.credentials.authorize(http)
        batch_request.execute(http=http)

score 2 · Accepted Answer

NumPy を使えば簡単です:

ints = array([1, 2, 3, 4, 5, 6, 7, 8])
for int1, int2 in ints.reshape(-1, 2):
    print(int1, int2)

出力：

score 1 · Accepted Answer

2番目の方法では、次のようにして4つのグループに進みます。

ints = ints[4:]

ただし、パフォーマンス測定を行っていないため、どちらがより効率的かわかりません。

そうは言っても、私は通常最初の方法を選択します。それはきれいではありませんが、それはしばしば外の世界とのインターフェースの結果です。

score 1 · Accepted Answer

以下は、ジェネレーターをサポートするインポートなしのチャンカーです。

def chunks(seq, size):
    it = iter(seq)
    while True:
        ret = tuple(next(it) for _ in range(size))
        if len(ret) == size:
            yield ret
        else:
            raise StopIteration()

使用例:

>>> def foo():
...     i = 0
...     while True:
...         i += 1
...         yield i
...
>>> c = chunks(foo(), 3)
>>> c.next()
(1, 2, 3)
>>> c.next()
(4, 5, 6)
>>> list(chunks('abcdefg', 2))
[('a', 'b'), ('c', 'd'), ('e', 'f')]

score 1 · Accepted Answer

J.F. Sebastian ここで与えられた解決策について：

def chunker(iterable, chunksize):
    return zip(*[iter(iterable)]*chunksize)

これは賢いですが、欠点が 1 つあります。常にタプルを返します。代わりに文字列を取得するには?
もちろんと書くこともできます''.join(chunker(...))が、とにかく一時的なタプルが構築されます。

次のように own を書くことで一時的なタプルを取り除くことができますzip:

class IteratorExhausted(Exception):
    pass

def translate_StopIteration(iterable, to=IteratorExhausted):
    for i in iterable:
        yield i
    raise to # StopIteration would get ignored because this is generator,
             # but custom exception can leave the generator.

def custom_zip(*iterables, reductor=tuple):
    iterators = tuple(map(translate_StopIteration, iterables))
    while True:
        try:
            yield reductor(next(i) for i in iterators)
        except IteratorExhausted: # when any of iterators get exhausted.
            break

それで

def chunker(data, size, reductor=tuple):
    return custom_zip(*[iter(data)]*size, reductor=reductor)

使用例:

>>> for i in chunker('12345', 2):
...     print(repr(i))
...
('1', '2')
('3', '4')
>>> for i in chunker('12345', 2, ''.join):
...     print(repr(i))
...
'12'
'34'

score 1 · Accepted Answer

さらに別の答え、その利点は次のとおりです。

1) 簡単に理解できる
2) シーケンスだけでなく、あらゆる iterable で動作する (上記の回答のいくつかはファイルハンドルで詰まる)
3) チャンクを一度にすべてメモリにロードしない
4) への参照のチャンク長いリストを作成しないメモリ内の同じイテレータ
5) リストの最後にあるフィル値のパディングなし

そうは言っても、私はそれを時間を計っていないので、より賢い方法のいくつかよりも遅くなる可能性があり、ユースケースを考えると利点のいくつかは無関係かもしれません.

def chunkiter(iterable, size):
  def inneriter(first, iterator, size):
    yield first
    for _ in xrange(size - 1): 
      yield iterator.next()
  it = iter(iterable)
  while True:
    yield inneriter(it.next(), it, size)

In [2]: i = chunkiter('abcdefgh', 3)
In [3]: for ii in i:                                                
          for c in ii:
            print c,
          print ''
        ...:     
        a b c 
        d e f 
        g h

更新:
内側と外側のループが同じイテレータから値を取得していることによるいくつかの欠点:
1) 外側のループで continue が期待どおりに機能しない - チャンクをスキップするのではなく、次の項目に進むだけです。 . ただし、外側のループでテストするものが何もないため、これは問題のようには見えません。
2) 内側のループでブレークが期待どおりに機能しません。制御は、イテレータの次の項目で再び内側のループに戻ります。チャンク全体をスキップするには、内側の反復子 (上記の ii) をタプル (例: ) でラップするかfor c in tuple(ii)、フラグを設定して反復子を使い果たします。

score 1 · Accepted Answer

def group_by(iterable, size):
    """Group an iterable into lists that don't exceed the size given.

    >>> group_by([1,2,3,4,5], 2)
    [[1, 2], [3, 4], [5]]

    """
    sublist = []

    for index, item in enumerate(iterable):
        if index > 0 and index % size == 0:
            yield sublist
            sublist = []

        sublist.append(item)

    if sublist:
        yield sublist

score 0 · Accepted Answer

リストからイテレータを外すことで、単にリストのスライスをコピーするだけではないことを願っています。ジェネレーターはスライスでき、自動的にジェネレーターのままになりますが、リストは 1000 エントリの巨大なチャンクにスライスされるため、効率が低下します。

def iter_group(iterable, batch_size:int):
    length = len(iterable)
    start = batch_size*-1
    end = 0
    while(end < length):
        start += batch_size
        end += batch_size
        if type(iterable) == list:
            yield (iterable[i] for i in range(start,min(length-1,end)))
        else:
            yield iterable[start:end]

使用法：

items = list(range(1,1251))

for item_group in iter_group(items, 1000):
    for item in item_group:
        print(item)

score 0 · Accepted Answer

itertools.groupby一時的なリストを作成せずに、イテラブルのイテラブルを取得する作業を簡単に行うことができます。

groupby(iterable, (lambda x,y: (lambda z: x.next()/y))(count(),100))

入れ子になったラムダにうんざりしないでください。外側のラムダは一度だけ実行され、count()ジェネレーターと定数100を内側のラムダのスコープに入れます。

これを使用して、行のチャンクを mysql に送信します。

for k,v in groupby(bigdata, (lambda x,y: (lambda z: x.next()/y))(count(),100))):
    cursor.executemany(sql, v)

score -1 · Accepted Answer

これを行うためのきれいな方法はないようです。これは、以下を含むいくつかのメソッドを含むページです。

def split_seq(seq, size):
    newseq = []
    splitsize = 1.0/size*len(seq)
    for i in range(size):
        newseq.append(seq[int(round(i*splitsize)):int(round((i+1)*splitsize))])
    return newseq

score -1 · Accepted Answer

リストが同じサイズの場合、それらをで 4 タプルのリストに結合できますzip()。例えば：

# Four lists of four elements each.

l1 = range(0, 4)
l2 = range(4, 8)
l3 = range(8, 12)
l4 = range(12, 16)

for i1, i2, i3, i4 in zip(l1, l2, l3, l4):
    ...

zip()関数が生成するものは次のとおりです。

>>> print l1
[0, 1, 2, 3]
>>> print l2
[4, 5, 6, 7]
>>> print l3
[8, 9, 10, 11]
>>> print l4
[12, 13, 14, 15]
>>> print zip(l1, l2, l3, l4)
[(0, 4, 8, 12), (1, 5, 9, 13), (2, 6, 10, 14), (3, 7, 11, 15)]

リストが大きく、それらを結合してより大きなリストにしたくない場合は、リストではitertools.izip()なく反復子を生成するを使用します。

from itertools import izip

for i1, i2, i3, i4 in izip(l1, l2, l3, l4):
    ...

score -2 · Accepted Answer

リスト内包表記を使用しない理由

l = [1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
n = 4
filler = 0
fills = len(l) % n
chunks = ((l + [filler] * fills)[x * n:x * n + n] for x in range(int((len(l) + n - 1)/n)))
print(chunks)

[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 0]]

python - リストをチャンクで反復処理する最も「pythonic」な方法は何ですか?

38 に答える 38

Related

Reference