java - 増大する大規模なフラットファイルをすばやく検索するには、どのような方法が最適ですか?

Question

詳細はまだ渡されていませんが、2 つのフィールド (docid、orgid) を含むファイルを検索するコマンドライン検索ツールを Java で実装する準備をしています。このファイルは最初は小さく、常に大きくなっていることがわかりました。docid を渡して orgid を取得できるようにする必要があります。

誰か教えてください - 私が上で述べたようなフラットファイルを検索するための最良のテクニックは何ですか?

現時点では、ファイル内の 50,000 行 (2 か月以上) のデータしか処理していませんが、システムが配置されると、データはさらに急速に増加します。

これを検索可能なバイナリシステムに保存しているように見えますが、最初に何を調べればよいかわかりません。

これをデータベースにダンプできますが、それはやり過ぎのようです。さらに、それを行うには、サーバーにデータベースをインストールする必要があり、それは困難になるでしょう。

score 2 · Accepted Answer

可能であれば、最初からいくつかのデータベースにデータを挿入します（おそらくhsqldbやh2のような軽いものです。

あなたのデータは Map のように振る舞うので、おそらくmapdbのようなものが良いでしょう (ただし、スキーマが変更される可能性が低いことを確認する必要があります)。

それでもこのフラットファイルを使用する必要がある場合は、Grep が最適です (フラットファイルを検索する最速のツールです)。

score 0 · Accepted Answer

Well, depending on the size of docid and orgid and the amount of ram you have available to use, you could simply use a hash table. Read everything into the hash table, and then query against the hash table. Of course, don't know how many lookups you have to make against this file nor how often this has to be run, and if it needs to be resident in memory or not.

Other options (as previously suggested) are to use a presisted DB. Most efficient way would be to read the file into the DB and truncate the file so that subsequent reads don't have to reread existing records. Plus your file remains manageable. Of course, a lot of questions arise if you try to do that. Ex: can you truncate the file? does another process expect the file to exist? how do you manage race conditions when you attempt to truncate? etc.

Using something like hsqldb or h2 would be great as they could be embedded in your app and you don't have to worry about having them installed independently. Of course, you need to provide a persistence space for them, or it doesn't do tremendous amount of help.

java - 増大する大規模なフラット ファイルをすばやく検索するには、どのような方法が最適ですか?

2 に答える 2

Related

Reference

java - 増大する大規模なフラットファイルをすばやく検索するには、どのような方法が最適ですか?