python - Alternative to Python Multiprocessing Manager dict for large read only store

Question

プロセスで使用される大きな（〜5G）読み取り専用辞書でマルチプロセッシングを使用しています。私は辞書全体を各プロセスに渡すことから始めましたが、メモリの制約に遭遇したため、マルチプロセッシングマネージャーの辞書を使用するように変更しました(このHow to share a dictionary between python in python without locked を読んだ後)

変更後、パフォーマンスが低下しました。より高速な共有データストアにはどのような代替手段がありますか? dict には 40 文字の文字列キーと 2 つの小さな文字列要素のタプルデータがあります。

score 0 · Accepted Answer

メモリマップドファイルを使用します。これは非常識に聞こえるかもしれませんが (パフォーマンスに関して)、いくつかの巧妙なトリックを使用すると、そうではないかもしれません。

ファイル内でバイナリ検索を使用してレコードを検索できるように、キーを並べ替えます。
ファイルの各行を同じ長さにするようにしてください (「固定幅レコード」)。

固定幅レコードを使用できない場合は、次の疑似コードを使用してください。

Read 1KB in the middle (or enough to be sure the longest line fits *twice*)
Find the first new line character
Find the next new line character
Get a line as a substring between the two positions
Check the key (first 40 bytes)
If the key is too big, repeat with a 1KB block in the first half of the search range, else in the upper half of the search range

パフォーマンスが十分でない場合は、C で拡張機能を作成することを検討してください。

python - Alternative to Python Multiprocessing Manager dict for large read only store

1 に答える 1

Related

Reference