python - 3つのリストを2つのジェネレーターに置き換える

Question

ジェネレーターを使用してアプリケーションを最適化したいのですが、3つのリストを作成する代わりに、2つのジェネレーターを使用したいと思います。現在のバージョンでの私のアプリの短いスキームは次のとおりです。

1）バイナリファイルからデータをロード->最初のリスト

self.stream_data = [ struct.unpack(">H", data_file.read(2))[0] for foo in
                       xrange(self.columns*self.rows) ]

2）いわゆる非ゼロ抑制データ（ゼロを含むすべてのデータ）->2番目のリストを作成します

self.NZS_data = list()
for row in xrange(self.rows):
    self.NZS_data.append( [ self.stream_data[column + row * self.rows ] 
                          for column in xrange(self.columns) ] )

3）ゼロ抑制データを作成します（座標にゼロなし）->3番目のリスト

self.ZS_data = list()
for row in xrange(self.rows):
    for column in xrange(self.columns):
        if self.NZS_data[row][column]:
            self.ZS_data.append( [ column, row, self.NZS_data[row][column] ] )

（これは、itertools.productを使用して単一のリスト内包に絞り込まれた可能性があることを私は知っています）

4）ZS_dataリストをファイルに保存します。

私はPythonのcProfilerを使用しましたが、ほとんどの場合（読み取りと解凍を除いて）、これら2つの（NZS_dataとZS_data）リストの作成に費やされています。データをファイルに保存するためだけに必要なので、2つのジェネレーターを使用することを考えていました。

1）ファイルを読み取るためのジェネレーターを作成する->最初のジェネレーター

self.stream_data = ( struct.unpack(">H", data_file.read(2))[0] for foo in
                       xrange(self.columns*self.rows) )

2）ZS_dataジェネレーターを作成します（このNZSデータは実際には必要ありません）

self.ZS_data = ( [column, row, self.stream_data.next()]
                 for row, column in itertools.product(xrange(self.rows),
                 xrange(self.columns))
                 if self.stream_data.next() )

もちろん、ジェネレーターから2つの異なる値を取得するため、これは正しく機能しません。

3）ジェネレーターを使用してデータをファイルに保存します。

どうすればこれができるのだろうか。たぶん、このアプリケーションの可能な最適化に関連する他のアイデアがありますか？

ジェネレーターに基づく追加ソリューション：

def create_ZS_data(self):
    self.ZS_data = ( [column, row, self.stream_data[column + row * self.rows ]]
                     for row, column in itertools.product(xrange(self.rows), xrange(self.columns))
                     if self.stream_data[column + row * self.rows ] )

プロファイラー情報：

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     3257    1.117    0.000   71.598    0.022 decode_from_merlin.py:302(create_ZS_file)
   463419   67.705    0.000   67.705    0.000 decode_from_merlin.py:86(<genexpr>)

ジョンの解決策：

create_ZS_data(self):
    self.ZS_data = list()
    for rowno, cols in enumerate(self.stream_data[i:i+self.columns] for i in xrange(0, len(self.stream_data), self.columns)):
        for colno, col in enumerate(cols):
            # col == value, (rowno, colno) = index
            if col:
                self.ZS_data.append([colno, rowno, col])

プロファイラー情報：

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     3257   18.616    0.006   19.919    0.006 decode_from_merlin.py:83(create_ZS_data)

score 3 · Accepted Answer

あなたはおそらく開梱をより効率的にすることができます...

self.data_stream = struct.unpack_from('>{}H'.format(self.rows*self.columns), data_file)

ループを次のように減らします。

for rowno, cols in enumerate(self.data_stream[i:i+self.columns] for i in xrange(0, len(self.data_stream), self.columns)):
    for colno, col in enumerate(cols):
        # col == value, (rowno, colno) = index
        if col == 0:
            pass # do something
        else:
            pass # do something else

注-テストされていません

python - 3つのリストを2つのジェネレーターに置き換える

1 に答える 1

Related

Reference