python - os.walk() キャッシング/高速化

Question

os.walk()クライアント[0]が作成する各クエリに対して[1]を実行するプロトタイプサーバー[0]があります。

私は現在、次の方法を検討しています。

このデータをメモリにキャッシュし、
クエリの高速化、および
うまくいけば、後でメタデータの保存とデータの永続化への拡張が可能になります。

ツリー構造のSQL は複雑だと思うので、実際に SQLite にコミットする前にアドバイスをいただければと思いました。

この種のデータを処理できる、クロスプラットフォームで組み込み可能またはバンドル可能な非 SQL データベースはありますか?

小さな (10k-100k ファイル) リストがあります。
接続数が非常に少ない (おそらく 10 ～ 20)。
メタデータの処理にもスケーリングできるようにしたいです。

[0] サーバーとクライアントは実際には同じソフトウェアです。これは P2P アプリケーションであり、メインサーバーを使用せずにローカルの信頼できるネットワークを介してファイルを共有するように設計されており、zeroconf検出に使用し、他のほとんどすべてにねじれています。

[1] クエリ時間は現在os.walk()、10,000 ファイルで 1.2 秒です

ウォーキングを行う Python コードの関連関数を次に示します。

def populate(self, string):
    for name, sharedir in self.sharedirs.items():
        for root, dirs, files, in os.walk(sharedir):
            for dir in dirs:
                if fnmatch.fnmatch(dir, string):
                    yield os.path.join(name, *os.path.join(root, dir)[len(sharedir):].split("/"))
            for file in files:
                if fnmatch.fnmatch(file, string): 
                    yield os.path.join(name, *os.path.join(root, ile)[len(sharedir):].split("/"))

score 3 · Accepted Answer

最初は質問を誤解していましたが、今では解決策があると思います（そして、新しい回答を正当化するのに十分なほど他の回答とは異なります）。基本的に、ディレクトリで初めて walk を実行するときに通常のクエリを実行しますが、生成された値を保存します。2 回目は、保存された値を生成するだけです。短いので os.walk() 呼び出しをラップしましたが、ジェネレーター全体を簡単にラップすることもできます。

cache = {}
def os_walk_cache( dir ):
   if dir in cache:
      for x in cache[ dir ]:
         yield x
   else:
      cache[ dir ]    = []
      for x in os.walk( dir ):
         cache[ dir ].append( x )
         yield x
   raise StopIteration()

メモリ要件についてはわかりませんが、定期的にを消去することを検討してcacheください。

score 3 · Accepted Answer

You don't need to persist a tree structure -- in fact, your code is busily dismantling the natural tree structure of the directory tree into a linear sequence, so why would you want to restart from a tree next time?

Looks like what you need is just an ordered sequence:

i   X    result of os.path.join for X

where X, a string, names either a file or directory (you treat them just the same), i is a progressively incrementing integer number (to preserve the order), and the result column, also a string, is the result of os.path.join(name, *os.path.join(root, &c.

This is perfectly easy to put in a SQL table, of course!

To create the table the first time, just remove the guards if fnmatch.fnmatch (and the string argument) from your populate function, yield the dir or file before the os.path.join result, and use a cursor.executemany to save the enumerate of the call (or, use a self-incrementing column, your pick). To use the table, populate becomes essentially a:

select result from thetable where X LIKE '%foo%' order by i

where string is foo.

score 0 · Accepted Answer

MongoDBを見たことがありますか? どうmod_pythonですか？スクリプトは接続間で永続的であるため、Pythonデータ構造にデータを保存するだけで済みますmod_python。os.walk()

python - os.walk() キャッシング/高速化

3 に答える 3

Related

Reference