python - 巨大な numpy 2D 配列を作成して埋める最速の方法は?

Question

巨大な (例: 96 Go、72000 行 * 72000 列) 配列を作成して、数式から得られる浮動小数点数で埋める必要があります。配列は後で計算されます。

import itertools, operator, time, copy, os, sys
import numpy 
from multiprocessing import Pool


def f2(x):  # more complex mathematical formulas that change according to values in *i* and *x*
    temp=[]
    for i in combine:
        temp.append(0.2*x[1]*i[1]/64.23)
    return temp

def combinations_with_replacement_counts(n, r):  #provide all combinations of r balls in n boxes
   size = n + r - 1
   for indices in itertools.combinations(range(size), n-1):
       starts = [0] + [index+1 for index in indices]
       stops = indices + (size,)
       yield tuple(map(operator.sub, stops, starts))

global combine
combine = list(combinations_with_replacement_counts(3, 60))  #here putted 60 but need 350 instead
print len(combine)
if __name__ == '__main__':
    t1=time.time()
    pool = Pool()              # start worker processes
    results = [pool.apply_async(f2, (x,)) for x in combine]
    roots = [r.get() for r in results]
    print roots [0:3]
    pool.close()
    pool.join()
    print time.time()-t1

このような巨大なnumpy配列を作成して埋める最速の方法は何ですか? リストを埋めてから集計してからnumpy配列に変換しますか?
2次元配列のケース/列/行が独立していて、配列の塗りつぶしを高速化できることを知って、計算を並列化できますか? マルチプロセッシングを使用してそのような計算を最適化する手がかり/トレイル?

score 0 · Accepted Answer

numpy.memmap目的の形状で空の配列を作成し、それを使用multiprocessing.Poolしてその値を設定できます。これを正しく行うと、プール内の各プロセスのメモリフットプリントも比較的小さく保たれます。

python - 巨大な numpy 2D 配列を作成して埋める最速の方法は?

2 に答える 2

Related

Reference