c# - 文字列キャッシング。メモリの最適化と再利用

Question

私は現在、さまざまなソース (IE、名前、識別子、ビジネスに関連する共通コードなど) から収集された大量の文字列データを処理する非常に大規模なレガシーアプリケーションに取り組んでいます。このデータだけでも、アプリケーションプロセスで最大 200 MB の RAM を使用できます。

私の同僚は、メモリフットプリントを削減するための 1 つの可能な戦略 (個々の文字列の多くがデータセット全体で重複しているため) について言及しました。たとえば…</p>

public class StringCacher()
{
    public readonly Dictionary<string, string> _stringCache;

    public StringCacher()
    {
        _stringCache = new Dictionary<string, string>();
    }   

    public string AddOrReuse(string stringToCache)
    {
        if (_stringCache.ContainsKey(stringToCache)
            _stringCache[stringToCache] = stringToCache;

        return _stringCache[stringToCache];
    }
}

次に、このキャッシングを使用するには...

public IEnumerable<string> IncomingData()
{
    var stringCache = new StringCacher();

    var dataList = new List<string>();

    // Add the data, a fair amount of the strings will be the same.
    dataList.Add(stringCache.AddOrReuse("AAAA"));
    dataList.Add(stringCache.AddOrReuse("BBBB"));
    dataList.Add(stringCache.AddOrReuse("AAAA"));
    dataList.Add(stringCache.AddOrReuse("CCCC"));
    dataList.Add(stringCache.AddOrReuse("AAAA"));

    return dataList;
}

文字列は不変であり、値型と同様の方法で機能させるためにフレームワークによって多くの内部作業が行われるため、これは各文字列のコピーを辞書に作成し、量を2倍にするだけだと半分考えていますディクショナリに格納されている文字列への参照を渡すだけでなく、使用されるメモリの量 (これは私の同僚が想定していることです)。

したがって、これが大量の文字列データセットで実行されることを考慮して...

文字列値の 30% が 2 回以上使用されると仮定すると、メモリを節約できますか?
これが正しく機能するという仮定はありますか?

score 12 · Accepted Answer

This is essentially what string interning is, except you don't have to worry how it works. In your example you are still creating a string, then comparing it, then leaving the copy to be disposed of. .NET will do this for you in runtime.

See also String.Intern and Optimizing C# String Performance (C Calvert)

If a new string is created with code like (String goober1 = "foo"; String goober2 = "foo";) shown in lines 18 and 19, then the intern table is checked. If your string is already in there, then both variables will point at the same block of memory maintained by the intern table.

So, you don't have to roll your own - it won't really provide any advantage. EDIT UNLESS: your strings don't usually live for as long as your AppDomain - interned strings live for the lifetime of the AppDomain, which is not necessarily great for GC. If you want short lived strings, then you want a pool. From String.Intern:

If you are trying to reduce the total amount of memory your application allocates, keep in mind that interning a string has two unwanted side effects. First, the memory allocated for interned String objects is not likely be released until the common language runtime (CLR) terminates. The reason is that the CLR's reference to the interned String object can persist after your application, or even your application domain, terminates. ...

EDIT 2 Also see Jon Skeets SO answer here

score 3 · Accepted Answer

これは既に組み込みの .NET であり、と呼ばれString.Intern、再発明する必要はありません。

c# - 文字列キャッシング。メモリの最適化と再利用

3 に答える 3

Related

Reference