python - ログファイルからすべての cPickle ダンプをロードする方法は?

Question

多数 (~1000) の比較的小さい (文字列のキー:値のペアが 50 個) 辞書をログファイルに書き込むコードを実行します。これを自動化するプログラムを使用してこれを行います。次のようなコマンドを実行することを考えています：

import random
import string
import cPickle as pickle
import zlib

fieldNames = ['AICc','Npix','Nparameters','DoF','chi-square','chi-square_nu']

tempDict = {}
overview = {}
iterList = []

# Create example dictionary to add to the log.
for item in fieldNames:
  tempDict[item] = random.choice([random.uniform(2,5), '', ''.join([random.choice(string.lowercase) for x in range(5)])])

# Compress and pickle and add the example dictionary to the log.
# tried  with 'ab' and 'wb' 
# is .p.gz the right extension for this kind of file??
# with open('google.p.gz', 'wb') as fp: 
with open('google.p.gz', 'ab') as fp:
  fp.write(zlib.compress(pickle.dumps(tempDict, pickle.HIGHEST_PROTOCOL),9))

# Attempt to read in entire log
i = 0
with open('google.p.gz', 'rb') as fp:
  # Call pickle.loads until all dictionaries loaded. 
  while 1:
    try:     
      i += 1
      iterList.append(i)
      overview[i] = {}
      overview[i] = pickle.loads(zlib.decompress(fp.read()))
    except:
      break

print tempDict
print overview

ログファイル (google.p.gz) に書き込まれた最後の辞書を読み込めるようにしたいのですが、現在のところ、最初の pickle.dump しか読み込まれません。

また、私がやっていることすべてを行うためのより良い方法はありますか? 私は周りを検索しましたが、このようなことをしているのは私だけのように感じ、過去にそれが悪い兆候であることがわかりました.

score 1 · Accepted Answer

入力と出力が一致しません。レコードを出力するときは、各レコードを個別に取得し、ピクルして圧縮し、結果を個別にファイルに書き込みます。

fp.write(zlib.compress(pickle.dumps(tempDict, pickle.HIGHEST_PROTOCOL),9))

ただし、レコードを入力するときは、ファイル全体を読み取り、圧縮を解除し、そこから単一のオブジェクトを unpickle します。

pickle.loads(zlib.decompress(fp.read()))

したがって、次に呼び出すときにはfp.read()何も残っていません。最初にファイル全体を読み取ることになります。

したがって、入力と出力を一致させる必要があります。これを行う方法は、正確な要件によって異なります。要件が次のとおりであるとします。

非常に多くのレコードが存在するため、ファイルをディスク上で圧縮する必要があります。
すべてのレコードが一度にファイルに書き込まれます (個々のレコードを追加する必要はありません)。
ファイル内のレコードへのランダムアクセスは必要ありません (最後のレコードに到達するために、常にファイル全体を読み取ることができます)。

これらの要件があるため、各レコードを個別に圧縮するのはお勧めできませんzlib。で使用されるDEFLATEアルゴリズムzlibは、繰り返されるシーケンスを見つけることで機能するため、大量のデータに最適です。単一のレコードではあまり効果がありません。gzipそれでは、モジュールを使用してファイル全体を圧縮および解凍してみましょう。

私はあなたのコードを調べながら、あなたのコードにいくつかの改善を加えました。

import cPickle as pickle
import gzip
import random
import string

field_names = 'AICc Npix Nparameters DoF chi-square chi-square_nu'.split()

random_value_constructors = [
    lambda: random.uniform(2,5),
    lambda: ''.join(random.choice(string.lowercase)
                    for x in xrange(random.randint(0, 5)))]

def random_value():
    """
    Return a random value, either a small floating-point number or a
    short string.
    """
    return random.choice(random_value_constructors)()

def random_record():
    """
    Create and return a random example record.
    """
    return {name: random_value() for name in field_names}

def write_records(filename, records):
    """
    Pickle each record in `records` and compress them to `filename`.
    """
    with gzip.open(filename, 'wb') as f:
        for r in records:
            pickle.dump(r, f, pickle.HIGHEST_PROTOCOL)

def read_records(filename):
    """
    Decompress `filename`, unpickle records from it, and yield them.
    """
    with gzip.open(filename, 'rb') as f:
        while True:
            try:
                yield pickle.load(f)
            except EOFError:
                return

python - ログファイルからすべての cPickle ダンプをロードする方法は?

1 に答える 1

Related

Reference