c# - C＃での整数データの単純なヒストグラム生成

Question

私が構築しているテストベンチの一部として、整数値（問題を解決するためのアルゴリズムに必要な反復回数）のヒストグラムを計算するための単純なクラスを探しています。答えは次のように呼ばれるべきです：

Histogram my_hist = new Histogram();

for( uint i = 0; i < NUMBER_OF_RESULTS; i++ )
{

    myHist.AddValue( some_result );
}

for( uint j = 0; j < myHist.NumOfBins; j++ )
{
     Console.WriteLine( "{0} occurred {1} times", myHist.BinValues[j], myHist.BinCounts[j] );
}

少しグーグルしてもうまく解決できなかったのには驚きましたが、正しいものを検索しなかったのかもしれません。そこに一般的な解決策はありますか、それとも私自身を転がす価値がありますか？

score 19 · Accepted Answer

あなたはSortedDictionaryを使うことができます

uint[] items = new uint[] {5, 6, 1, 2, 3, 1, 5, 2}; // sample data
SortedDictionary<uint, int> histogram = new SortedDictionary<uint, int>();
foreach (uint item in items) {
    if (histogram.ContainsKey(item)) {
        histogram[item]++;
    } else {
        histogram[item] = 1;
    }
}
foreach (KeyValuePair<uint, int> pair in histogram) {
    Console.WriteLine("{0} occurred {1} times", pair.Key, pair.Value);
}

ただし、これにより空のビンが除外されます

score 6 · Accepted Answer

BastardSaintの提案に基づいて、私はきちんとしたかなり一般的なラッパーを思いつきました。

public class Histogram<TVal> : SortedDictionary<TVal, uint>
{
    public void IncrementCount(TVal binToIncrement)
    {
        if (ContainsKey(binToIncrement))
        {
            this[binToIncrement]++;
        }
        else
        {
            Add(binToIncrement, 1);
        }
    }
}

だから今私はできる：

const uint numOfInputDataPoints = 5;
Histogram<uint> hist = new Histogram<uint>();

// Fill the histogram with data
for (uint i = 0; i < numOfInputDataPoints; i++)
{
    // Grab a result from my algorithm
    uint numOfIterationsForSolution = MyAlorithm.Run();

    // Add the number to the histogram
    hist.IncrementCount( numOfIterationsForSolution );
}

// Report the results
foreach (KeyValuePair<uint, uint> histEntry in hist.AsEnumerable())
{
    Console.WriteLine("{0} occurred {1} times", histEntry.Key, histEntry.Value);
}

ジェネリックにする方法を理解するのに少し時間がかかりました（最初はコンストラクターをオーバーライドしただけなので、キーSortedDictionaryにしか使用できませんでした）。uint

score 5 · Accepted Answer

Linqを使用できます：

var items = new[] {5, 6, 1, 2, 3, 1, 5, 2};
items
    .GroupBy(i => i)
    .Select(g => new {
        Item = g.Key,
        Count = g.Count()
    })
    .OrderBy(g => g.Item)
    .ToList()
    .ForEach(g => {
        Console.WriteLine("{0} occurred {1} times", g.Item, g.Count);
    });

score 0 · Accepted Answer

ヒストグラムを作成するための単純な拡張メソッドの実装：

public static IReadOnlyDictionary<T, int> ToHistogram<T>(this IEnumerable<T> enumerable)
   => enumerable.GroupBy(item => item).ToDictionary(grouping => grouping.Key, grouping => grouping.Count());

score 0 · Accepted Answer

これは、受け入れられた答えに基づいています。問題は、挿入と取得の両方にO（log（N））SortedDictionaryのコストがかかるため、反復的な構築が遅いことです。

ヒストグラムが蓄積されているときにヒストグラムを表示する必要がない場合は、これを回避できます。

私の変更は法線Dictionaryを使用し、最後にそれを。にソートするだけSortedListです。

サンプルサイズが1,000万アイテムの場合、このバージョンは（私のマシンでは）約11倍高速ですが、GCが起動するまでのメモリ使用量がわずかに高くなります（約10％の追加メモリ）。

//generate a random sample
Random r = new Random();
var items = Enumerable
    .Range(1, 10_000_000)
    .Select( _ => (uint)r.Next(100_000))
    .ToList();

//build the histogram using a normal dictionary with O(1) lookups and insertions.
var tempHistogram = new Dictionary<uint, int>();
foreach (uint item in items)
{
    if (tempHistogram.ContainsKey(item))
    {
        tempHistogram[item]++;
    }
    else
    {
        tempHistogram[item] = 1;
    }
}

//Sort it once. SortedList conveniently has a ctor that takes a dictionary.
var sortedHistogram = new SortedList<uint, int>(tempHistogram);

foreach (KeyValuePair<uint, int> pair in sortedHistogram.Take(100))
{
    Console.WriteLine("{0} occurred {1} times", pair.Key, pair.Value);
}

非常に大きなサンプル（使用可能なメモリよりも大きい）の場合、この問題を解決する驚くべき確率的アルゴリズムがあります。また、データのストリーミング
にも非常に適しています。「分位スケッチ」を探します。これがApacheFoundationからの実装です：https ：//datasketches.apache.org/

c# - C＃での整数データの単純なヒストグラム生成

5 に答える 5

Related

Reference