c++ - ベクトル内の最も頻度の高い char 要素の決定?

Question

文字を要素とするベクトルで最も頻繁に使用される文字を特定しようとしています。

私はこれを行うことを考えています:

ベクターをループしてマップを作成します。キーはベクター内で見つかった一意の文字になります。対応する値は、その文字の頻度の整数カウントになります。
ベクトル内のすべての要素を調べた後、マップにはすべての文字頻度が含まれます。したがって、どのキーが最も高い値を持っているかを見つけて、ベクトル内で最も頻繁に使用される文字を決定する必要があります。

これは非常に複雑に思えますが、この方法がパフォーマンス/優れたコーディングの観点から「許容できる」と見なされるかどうかを誰かが提案できるかどうか疑問に思っていました

これをより良い方法で行うことはできますか？

score 2 · Accepted Answer

文字のベクトルをソートし、それを反復して最大実行長を探すと、マップアプローチを使用するよりも 5 倍高速になるようです (16M 文字で動作する以下のかなり非科学的なテストコードを使用)。表面的には、両方の関数は O(N log N) で実行されるため、互いに近いパフォーマンスを発揮するはずです。ただし、ソート方法は、インプレースベクトルソートの分岐予測と移動セマンティクスの恩恵を受ける可能性があります。

結果の出力は次のとおりです。

Most freq char is '\334', appears 66288 times.
usingSort() took 938 milliseconds
Most freq char is '\334', appears 66288 times.
usingMap() took 5124 milliseconds

コードは次のとおりです。

#include <iostream>
#include <map>
#include <vector>
#include <chrono>

void usingMap(std::vector<char> v)
{
    std::map<char, int> m;
    for ( auto c : v )
    {
        auto it= m.find(c);
        if( it != m.end() )
            m[c]++;
        else
            m[c] = 1;
    }

    char mostFreq;
    int count = 0;
    for ( auto mi : m )
        if ( mi.second > count )
        {
            mostFreq = mi.first;
            count = mi.second;
        }

    std::cout << "Most freq char is '" << mostFreq << "', appears " << count << " times.\n";
}

void usingSort(std::vector<char> v)
{
    std::sort( v.begin(), v.end() );
    char currentChar = v[0];
    char mostChar = v[0];
    int currentCount = 0;
    int mostCount = 0;
    for ( auto c : v )
    {
        if ( c == currentChar )
            currentCount++;
        else
        {
            if ( currentCount > mostCount)
            {
                mostChar = currentChar;
                mostCount = currentCount;
            }
            currentChar = c;
            currentCount = 1;
        }
    }

    std::cout << "Most freq char is '" << mostChar << "', appears " << mostCount << " times.\n";
}

int main(int argc, const char * argv[])
{
    size_t size = 1024*1024*16;
    std::vector<char> v(size);
    for ( int i = 0; i < size; i++)
    {
        v[i] = random() % 256;
    }

    auto t1 = std::chrono::high_resolution_clock::now();
    usingSort(v);
    auto t2 = std::chrono::high_resolution_clock::now();
    std::cout
        << "usingSort() took "
        << std::chrono::duration_cast<std::chrono::milliseconds>(t2-t1).count()
        << " milliseconds\n";
    auto t3 = std::chrono::high_resolution_clock::now();
    usingMap(v);
    auto t4 = std::chrono::high_resolution_clock::now();
    std::cout
        << "usingMap() took "
        << std::chrono::duration_cast<std::chrono::milliseconds>(t4-t3).count()
        << " milliseconds\n";
    return 0;
}

c++ - ベクトル内の最も頻度の高い char 要素の決定?

2 に答える 2

Related

Reference