c++ - 異なる数の数を数える時間効率の良い方法

Question

get_number()整数を返します。これを 30 回呼び出して、返された個別の整数の数を数えます。私の計画は、これらの数値をに入れstd::array<int,30>、並べ替えてからを使用することstd::uniqueです。

それは良い解決策ですか？より良いものはありますか？このコードは、私のプログラムのボトルネックになります。

ハッシュベースのソリューションが必要だと考えていますが、要素が 30 個しかない場合、オーバーヘッドが大きすぎるのではないでしょうか?

編集uniqueをdistinctに変更しました。例：

{1,1,1,1} => 1
{1,2,3,4} => 4
{1,3,3,1} => 2

score 7 · Accepted Answer

私はstd::set<int>それがより簡単なので使用します：

std::set<int> s;
for(/*loop 30 times*/)
{
   s.insert(get_number());
}
std::cout << s.size() << std::endl; // You get count of unique numbers

一意の番号ごとに返される時間をカウントしたい場合は、お勧めしますmap

std::map<int, int> s;
for(int i=0; i<30; i++)
{
  s[get_number()]++;
}

cout << s.size() << std::endl;  // total count of distinct numbers returned

for (auto it : s)
{
  cout << it.first << " " << it.second<< std::endl;  // each number and return counts
}

score 3 · Accepted Answer

std::map、std::setまたはstd::sortアルゴリズムを使用すると、O(n*log(n))複雑になります。少数から多数の要素の場合、それは完全に正しいです。しかし、既知の整数範囲を使用すると、多くの最適化への扉が開かれます。

あなたが（コメントで）言うように、あなたの整数の範囲は既知で短いです：[0..99]. 修正カウントソートを実装することをお勧めします。参照: http://en.wikipedia.org/wiki/Counting_sort

ソート自体を実行しながら個別のアイテムの数をカウントできるため、std::unique呼び出しの必要がなくなります。全体の複雑さはO(n). もう 1 つの利点は、必要なメモリが入力項目の数に依存しないことです。ソートする整数が 30.000.000.000 ある場合、個別の項目をカウントするために 1 つの補助バイトは必要ありません。

許容される整数値の範囲が広い場合でも、[0..10.000.000]消費されるメモリはかなり低いと言えます。実際、最適化されたバージョンでは、許容される整数値ごとに 1 ビットしか消費できません。これは、2 MB 未満のメモリまたはラップトップ RAM の 1/1000 未満です。

以下に短いプログラム例を示します。

#include <cstdlib>
#include <algorithm>
#include <iostream>
#include <vector>

// A function returning an integer between [0..99]
int get_number()
{
    return rand() % 100;
}


int main(int argc, char* argv[])
{
    // reserves one bucket for each possible integer
    // and initialize to 0
    std::vector<int> cnt_buckets(100, 0);
    int nb_distincts = 0;

    // Get 30 numbers and count distincts
    for(int i=0; i<30; ++i)
    {
        int number = get_number();
        std::cout << number << std::endl;
        if(0 == cnt_buckets[number])
            ++ nb_distincts;

        // We could optimize by doing this only the first time
        ++ cnt_buckets[number];
    }

    std::cerr << "Total distincts numbers: " << nb_distincts << std::endl;
}

あなたはそれが働いているのを見ることができます：

$ ./main | sort | uniq | wc -l
Total distincts numbers: 26
26

score 0 · Accepted Answer

最も簡単な方法は、を使用することstd::setです。

std::set<int> s;
int uniqueCount = 0;

for( int i = 0; i < 30; ++i )
{
    int n = get_number();

    if( s.find(n) != s.end() ) {
        --uniqueCount;
        continue;
    }

    s.insert( n );
}

// now s contains unique numbers
// and uniqueCount contains the number of unique integers returned

score 0 · Accepted Answer

セットを試して、順序付けられていないセットを試して、並べ替えと一意を試して、他の楽しいと思われることを試してください。

次に、それぞれを測定します。最速の実装が必要な場合は、実際のコードを試して実際の動作を確認する以外に方法はありません。

特定のプラットフォーム、コンパイラ、およびその他の詳細は確かに重要であるため、本番環境で実行される場所にできるだけ近い環境でテストしてください。

score 0 · Accepted Answer

arrayandを使用するのsortは良さそうですが、unique個別の値をカウントする必要があるだけの場合は、少しやり過ぎかもしれません。次の関数は、ソートされた範囲内の個別の値の数を返す必要があります。

template<typename ForwardIterator>
size_t distinct(ForwardIterator begin, ForwardIterator end) {
  if (begin == end) return 0;

  size_t count = 1;
  ForwardIterator prior = begin;
  while (++begin != end)
  {
    if (*prior != *begin)
      ++count;

    prior = begin;
  }
  return count;
}

set- または -ベースのアプローチとは対照的に、mapこれはヒープ割り当てを必要とせず、要素はメモリに継続的に格納されるため、はるかに高速になるはずです。漸近的な時間の複雑さはO(N log N)、連想コンテナーを使用する場合と同じです。を使用するという元のソリューションでさえ、を使用std::sortするstd::uniqueよりもはるかに高速になると思いますstd::set。

c++ - 異なる数の数を数える時間効率の良い方法

6 に答える 6

Related

Reference