java - Javaコレクションとメモリの最適化

Question

500k文字列に500MBのヒープを使用するカスタムテーブルにカスタムインデックスを作成しました。文字列の10％のみが一意です。残りは繰り返しです。すべての文字列の長さは4です。

コードを最適化するにはどうすればよいですか？別のコレクションを使用する必要がありますか？メモリを節約するためにカスタム文字列プールを実装しようとしました：

public class StringPool {

    private static WeakHashMap<String, String> map = new WeakHashMap<>();

    public static String getString(String str) { 
        if (map.containsKey(str)) {
            return map.get(str);
        } else {
            map.put(str, str);
            return map.get(str);
        }
    }
}

private void buildIndex() {
        if (monitorModel.getMessageIndex() == null) {
            // the index, every columns create an index
            ArrayList<HashMap<String, TreeSet<Integer>>> messageIndex = new ArrayList<>(filterableColumn.length);
            for (int i = filterableColumn.length; i >= 0; i--) {
                // key -> string,   value -> treeset, the row wich contains the key
                HashMap<String, TreeSet<Integer>> hash = new HashMap<>();
                messageIndex.add(hash);
            }
            // create index for every column
            for (int i = monitorModel.getParser().getMyMessages().getMessages().size() - 1; i >= 0; --i) {
                TreeSet<Integer> tempList;

                for (int j = 0; j < filterableColumn.length; j++) {
                    String value  = StringPool.getString(getValueAt(i, j).toString());
                    if (!messageIndex.get(j).containsKey(value)) {
                        tempList = new TreeSet<>();
                        messageIndex.get(j).put(value, tempList);
                    } else {
                        tempList = messageIndex.get(j).get(value);
                    }

                    tempList.add(i);
                }
            }
            monitorModel.setMessageIndex(messageIndex);
        }
    }

score 5 · Accepted Answer

カスタムプールを考え出す必要はありません。を使用するだけString.intern()です。

score 4 · Accepted Answer

プロファイラーでメモリヒープを調べたい場合があります。私の推測では、メモリ消費は主に文字列ストレージではなく、多くのTreeSet<Integer>場合に発生します。int[]その場合は、プリミティブ配列（格納している整数値の実際のサイズに応じて、、、、short[]または）を使用して大幅に最適化できます。または、 FastUtilやTrovebyte[]によって提供されるようなプリミティブコレクションタイプを調べることもできます。

文字列ストレージに問題があることがわかった場合は、アプリケーションを500k文字列を超えてスケーリングするか、特に厳しいメモリ制約により、短い文字列でも重複排除する必要があると想定します。

Devが言ったように、String.intern()文字列を重複排除します。ただし、1つの注意点として、OracleおよびOpenJDK仮想マシンでは、String.intern()これらの文字列がVMの永続世代に格納されるため、将来ガベージコレクションされることはありません。これは、次の場合に適切（かつ役立つ）です。

保存している文字列は、VMの存続期間を通じて変更されません（たとえば、起動時に静的リストを読み取り、アプリケーションの存続期間を通じて使用する場合）。
保存する必要のある文字列は、VMの永続的な世代に快適に適合します（クラスローディングやPermGenの他のコンシューマーのための十分なスペースがあります）。更新：以下を参照してください。

If either of those conditions is false, you are probably correct to build a custom pool. But my recommendation is that you consider a simple HashMap in place of the WeakHashMap you're currently using. You probably don't want these values to be garbage-collected while they're in your cache, and WeakHashMap adds another level of indirection (and the associated object pointers), increasing memory consumption further.

Update: I'm told that JDK 7 stores interned Strings (String.intern()) in the main heap, not in perm-gen, as earlier JDKs did. That makes String.intern() less risky if you're using JDK 7.

java - Javaコレクションとメモリの最適化

2 に答える 2

Related

Reference