python - Pythonで大きな辞書を保存するピクルス対棚

Question

大きなディレクトリをpickleファイルとして保存している場合、それをロードcPickleするということは、すべてが一度にメモリに消費されることを意味しますか？

もしそうなら、のようなものを取得するクロスプラットフォームの方法はありますが、pickle各エントリにアイテムの1つのキーにアクセスします（つまり、すべての辞書をメモリにロードすることを避け、各エントリを名前でのみロードします）？私はこれを行うことになっていることを知っています：それはまるでshelveポータブルですか？pickle

score 24 · Accepted Answer

私は棚がこれをすることになっていることを知っています：それはピクルスと同じくらい持ち運び可能ですか？

はい。shelveはPython標準ライブラリの一部であり、Pythonで記述されています。

編集

したがって、大きな辞書がある場合：

bigd = {'a': 1, 'b':2, # . . .
}

そして、後で全部を読まなくても保存したいので、ピクルスとして保存しないでください。ディスク辞書のような棚として保存する方がよいでしょう。

import shelve

myShelve = shelve.open('my.shelve')
myShelve.update(bigd)
myShelve.close()

その後、次のことができます。

import shelve

myShelve = shelve.open('my.shelve')
value = myShelve['a']
value += 1
myShelve['a'] = value

基本的に棚のオブジェクトを口述のように扱いますが、アイテムはディスクに（個別のピクルスとして）保存され、必要に応じて読み込まれます。

オブジェクトをプロパティのリストとして保存できる場合は、sqliteが適切な代替手段になる可能性があります。棚とピクルスは便利ですが、Pythonでのみアクセスできますが、sqliteデータベースはほとんどの言語から読み取ることができます。

score 8 · Accepted Answer

よりも堅牢なモジュールが必要な場合はshelve、を参照してくださいklepto。 kleptoは、ディスクまたはデータベース上のプラットフォームに依存しないストレージへの辞書インターフェイスを提供するように構築されており、大規模なデータを処理するように構築されています。

ここでは、最初にディスクに保存されたピクルスオブジェクトをいくつか作成します。それらは、dir_archiveファイルごとに1つのオブジェクトを格納するを使用します。

>>> d = dict(zip('abcde',range(5)))
>>> d['f'] = max
>>> d['g'] = lambda x:x**2
>>> 
>>> import klepto
>>> help(klepto.archives.dir_archive)       

>>> print klepto.archives.dir_archive.__new__.__doc__
initialize a dictionary with a file-folder archive backend

    Inputs:
        name: name of the root archive directory [default: memo]
        dict: initial dictionary to seed the archive
        cached: if True, use an in-memory cache interface to the archive
        serialized: if True, pickle file contents; otherwise save python objects
        compression: compression level (0 to 9) [default: 0 (no compression)]
        memmode: access mode for files, one of {None, 'r+', 'r', 'w+', 'c'}
        memsize: approximate size (in MB) of cache for in-memory compression

>>> a = klepto.archives.dir_archive(dict=d)
>>> a
dir_archive('memo', {'a': 0, 'c': 2, 'b': 1, 'e': 4, 'd': 3, 'g': <function <lambda> at 0x102f562a8>, 'f': <built-in function max>}, cached=True)
>>> a.dump()
>>> del a

これで、データはすべてディスク上にあります。メモリにロードするデータを選択してみましょう。bはメモリb.archive内の辞書であり、ファイルのコレクションを辞書ビューにマップします。

>>> b = klepto.archives.dir_archive('memo')
>>> b
dir_archive('memo', {}, cached=True)
>>> b.keys()   
[]
>>> b.archive.keys()
['a', 'c', 'b', 'e', 'd', 'g', 'f']
>>> b.load('a')
>>> b
dir_archive('memo', {'a': 0}, cached=True)
>>> b.load('b')
>>> b.load('f')
>>> b.load('g')
>>> b['g'](b['f'](b['a'],b['b']))
1

kleptosqlアーカイブへの同じインターフェースも提供します。

>>> print klepto.archives.sql_archive.__new__.__doc__
initialize a dictionary with a sql database archive backend

    Connect to an existing database, or initialize a new database, at the
    selected database url. For example, to use a sqlite database 'foo.db'
    in the current directory, database='sqlite:///foo.db'. To use a mysql
    database 'foo' on localhost, database='mysql://user:pass@localhost/foo'.
    For postgresql, use database='postgresql://user:pass@localhost/foo'. 
    When connecting to sqlite, the default database is ':memory:'; otherwise,
    the default database is 'defaultdb'. If sqlalchemy is not installed,
    storable values are limited to strings, integers, floats, and other
    basic objects. If sqlalchemy is installed, additional keyword options
    can provide database configuration, such as connection pooling.
    To use a mysql or postgresql database, sqlalchemy must be installed.

    Inputs:
        name: url for the sql database [default: (see note above)]
        dict: initial dictionary to seed the archive
        cached: if True, use an in-memory cache interface to the archive
        serialized: if True, pickle table contents; otherwise cast as strings

>>> c = klepto.archives.sql_archive('database')
>>> c.update(b)
>>> c
sql_archive('sqlite:///database', {'a': 0, 'b': 1, 'g': <function <lambda> at 0x10446b1b8>, 'f': <built-in function max>}, cached=True)
>>> c.dump()

ここで、ディスク上の同じオブジェクトもSQLアーカイブにあります。どちらのアーカイブにも新しいオブジェクトを追加できます。

>>> b['x'] = 69
>>> c['y'] = 96
>>> b.dump('x')
>>> c.dump('y')

kleptoここにアクセス：https ：//github.com/uqfoundation

python - Pythonで大きな辞書を保存するピクルス対棚

2 に答える 2

編集

Related

Reference