c++ - ペアの数が非常に多い場合、並列比較を置き換える (または高速化する) 方法は?

Question

次のコード、並列比較の実行時間は、マップ内のキーの数が膨大 (たとえば 100000) で、その 2 番目の要素のそれぞれにも巨大な要素 (たとえば 100000) がある場合、無限に時間がかかります。

比較を高速化する方法はありますか？私のCPUはXeon E5450 3.00G 4コアです。ラムは十分に公正です。

// There is a map with long as its key and vector<long> as second element, 
//     the vector's elements are increasing sorted.
map<long, vector<long> > = aMap() ;
map<long, vector<long> >::iterator it1 = aMap.begin() ;
map<long, vector<long> >::iterator it2; 

// the code need compare each key's second elements 
for( ; it1 != aMap.end(); it1++ ) {
  it2 = it1; 
  it2++;

  // Parallel comparsion: THE MOST TIME CONSUMING PART
  for( ; it2 != aMap.end(); it2++ ) {
    unsigned long i = 0, j = 0, _union = 0, _inter = 0 ;

    while( i < it1->second.size() && j < it2->second.size() ) {
      if( it1->second[i] < it2->second[j] ) {
        i++; 
      } else if( it1->second[i] > it2->second[j] ) {
        j++; 
      } else {
        i++; j++; _inter++;
      }
    }
    _union = it1->second.size() + it2->second.size() - _inter;

    if ( (double) _inter / _union > THRESH )
      cout << it1->first << " might appears frequently with " << it2->first << endl;
  }
}

score 2 · Accepted Answer

（これは完全な答えではありません。問題の一部、つまりビット操作に関する部分のみを解決します。）

2 つのセット間の交差の数 (交差のカーディナリティ) をかなり迅速に計算するために使用できるクラスを次に示します。

ビットベクトルを使用してセットを格納します。つまり、可能なセットメンバーのユニバースは小さくなければなりません。

#include <cassert>
#include <vector>

class BitVector
{
    // IMPORTANT: U must be unsigned
    // IMPORTANT: use unsigned long long in 64-bit builds.
    typedef unsigned long U;
    static const unsigned UBits = 8 * sizeof(U);

public:
    BitVector (unsigned size)
        : m_bits ((size + UBits - 1) / UBits, 0)
        , m_size (size)
    {
    }

    void set (unsigned bit)
    {
        assert (bit < m_size);
        m_bits[bit / UBits] |= (U)1 << (bit % UBits);
    }

    void clear (unsigned bit)
    {
        assert (bit < m_size);
        m_bits[bit / UBits] &= ~((U)1 << (bit % UBits));
    }

    unsigned countIntersect (BitVector const & that) const
    {
        assert (m_size == that.m_size);

        unsigned ret = 0;
        for (std::vector<U>::const_iterator i = m_bits.cbegin(),
             j = that.m_bits.cbegin(), e = m_bits.cend(), f = that.m_bits.cend();
             i != e && j != f; ++i, ++j)
        {
            U x = *i & *j;

            // Count the number of 1 bits in x and add it to ret
            // There are much better ways than this,
            // e.g. using the POPCNT instruction or intrinsic
            while (x != 0)
            {
                ret += x & 1;
                x >>= 1;
            }
        }

        return ret;
    }

    unsigned countUnion (BitVector const & that) const
    {
        assert (m_size == that.m_size);

        unsigned ret = 0;
        for (std::vector<U>::const_iterator i = m_bits.cbegin(),
             j = that.m_bits.cbegin(), e = m_bits.cend(), f = that.m_bits.cend();
             i != e && j != f; ++i, ++j)
        {
            U x = *i | *j;

            while (x != 0)
            {
                ret += x & 1;
                x >>= 1;
            }
        }

        return ret;
    }

private:
    std::vector<U> m_bits;
    unsigned m_size;
};

上記のクラスをどのように使用できるかを確認するための非常に小さなテストプログラムを次に示します。2 つのセット (それぞれ最大 100K の要素を持つ) を作成し、いくつかの項目を (set()メンバー関数を使用して) それらに追加し、それらの交差を 10 億回計算します。私のマシンでは 2 秒以内に実行されます。

#include <iostream>

using namespace std;

int main ()
{
    unsigned const SetSize = 100000;
    BitVector a (SetSize), b (SetSize);

    for (unsigned i = 0; i < SetSize; i += 2) a.set (i);
    for (unsigned i = 0; i < SetSize; i += 3) b.set (i);
    
    unsigned x = a.countIntersect (b);
    cout << x << endl;

    return 0;
}

これを最適化を有効にしてコンパイルすることを忘れないでください! そうしないと、パフォーマンスが非常に悪くなります。

POPCNT

最新のプロセッサには、POPCNT と呼ばれる、ワードに設定されたビット数をカウントする命令があります。これは、上記の単純なことを実行するよりもはるかに高速です (補足として、ソフトウェアで実行するより高速な方法もありますが、コードを汚染したくありませんでした。)

とにかく、C/C++ コードで POPCNT を使用する方法は、コンパイラの組み込み関数または組み込み関数を使用することです。__popcount()MSVC では、 32 ビット整数で機能する組み込み関数を使用できます。__builtin_popcountl()GCC では、 32 ビット整数と__builtin_popcountll()64 ビットに使用できます。これらの関数は、コンパイラのバージョン、ターゲットアーキテクチャ、および/またはコンパイルスイッチが原因で使用できない場合があることに注意してください。

score 0 · Accepted Answer

PPLを試してみたいと思うかもしれません。またはその類似体のいくつか。出力がないように見えるため、コードが何をするのかよくわかりません。ただし、副作用のないコードは、Parallel Patterns Library などのツールを使用して効果的に並列化できます。

c++ - ペアの数が非常に多い場合、並列比較を置き換える (または高速化する) 方法は?

2 に答える 2

POPCNT

Related

Reference