c++ - 辞書式インデックスを計算する最も効率的な方法

Question

次のタスクを達成するためのより効率的なアルゴリズムを見つけられる人はいますか?:

0 から 7 までの整数の任意の順列について、順列を辞書式に記述するインデックスを返します (1 ではなく 0 からインデックス付けされます)。

例えば、

配列 0 1 2 3 4 5 6 7 はインデックス 0 を返す必要があります。
配列 0 1 2 3 4 5 7 6 はインデックス 1 を返す必要があります。
配列 0 1 2 3 4 6 5 7 は 2 のインデックスを返す必要があります。
配列 1 0 2 3 4 5 6 7 は 5039 のインデックスを返す必要があります (つまり 7!-1 またはfactorial(7)-1)。
配列 7 6 5 4 3 2 1 0 は 40319 のインデックスを返す必要があります (つまり 8!-1)。これは可能な最大の戻り値です。

私の現在のコードは次のようになります。

int lexic_ix(int* A){
    int value = 0;
    for(int i=0 ; i<7 ; i++){
        int x = A[i];
        for(int j=0 ; j<i ; j++)
            if(A[j]<A[i]) x--;
        value += x*factorial(7-i);  // actual unrolled version doesn't have a function call
    }
    return value;
}

その内部ループを削除することで操作の数を減らす方法があるかどうか、または何らかの方法で条件付き分岐を減らすことができるかどうか (展開以外 - 私の現在のコードは実際には上記の展開されたバージョンです)、または巧妙なビット単位のハックや汚い C のトリックが役立つ場合。

もう交換してみた

if(A[j]<A[i]) x--;

と

x -= (A[j]<A[i]);

そして私も試しました

x = A[j]<A[i] ? x-1 : x;

どちらの交換も、実際にはパフォーマンスの低下につながりました。

そして、誰かが言う前に - はい、これは大きなパフォーマンスのボトルネックです: 現在、プログラムの実行時間の約 61% がこの関数で費やされており、いいえ、事前に計算された値のテーブルは必要ありません。

それらを除いて、どんな提案も大歓迎です。

score 2 · Accepted Answer

これが役立つかどうかはわかりませんが、別の解決策は次のとおりです。

int lexic_ix(int* A, int n){ //n = last index = number of digits - 1
    int value = 0;
    int x = 0;
    for(int i=0 ; i<n ; i++){
        int diff = (A[i] - x); //pb1
        if(diff > 0)
        {
            for(int j=0 ; j<i ; j++)//pb2
            {
                if(A[j]<A[i] && A[j] > x)
                {
                    if(A[j]==x+1)
                    {
                      x++;
                    }
                    diff--;
                }
            }
            value += diff;
        }
        else
        {
          x++;
        }
        value *= n - i;
    }
    return value;
}

内部ループを取り除くことができなかったため、複雑さは最悪の場合は o(n log(n)) ですが、最良の場合は o(n) であり、全体で o(n log(n)) であるソリューションに対してケース。

または、内側のループを次のように置き換えて、内側のループで別の検証を犠牲にしていくつかの最悪のケースを取り除くことができます。

int j=0;
while(diff>1 && j<i)
{
  if(A[j]<A[i])
  {
    if(A[j]==x+1)
    {
      x++;
    }
    diff--;
  }
  j++;
}

説明:

（または、「そのコードでどのように終了したか」というより、あなたのものとそれほど変わらないと思いますが、おそらくアイデアを持たせることができます）（混乱を避けるために、代わりに文字を使用し、数字と4文字のみを使用しました）

abcd 0  = ((0 * 3 + 0) * 2 + 0) * 1 + 0
abdc 1  = ((0 * 3 + 0) * 2 + 1) * 1 + 0
acbd 2  = ((0 * 3 + 1) * 2 + 0) * 1 + 0
acdb 3  = ((0 * 3 + 1) * 2 + 1) * 1 + 0
adbc 4  = ((0 * 3 + 2) * 2 + 0) * 1 + 0
adcb 5  = ((0 * 3 + 2) * 2 + 1) * 1 + 0 //pb1
bacd 6  = ((1 * 3 + 0) * 2 + 0) * 1 + 0
badc 7  = ((1 * 3 + 0) * 2 + 1) * 1 + 0
bcad 8  = ((1 * 3 + 1) * 2 + 0) * 1 + 0 //First reflexion
bcda 9  = ((1 * 3 + 1) * 2 + 1) * 1 + 0
bdac 10 = ((1 * 3 + 2) * 2 + 0) * 1 + 0
bdca 11 = ((1 * 3 + 2) * 2 + 1) * 1 + 0
cabd 12 = ((2 * 3 + 0) * 2 + 0) * 1 + 0
cadb 13 = ((2 * 3 + 0) * 2 + 1) * 1 + 0
cbad 14 = ((2 * 3 + 1) * 2 + 0) * 1 + 0
cbda 15 = ((2 * 3 + 1) * 2 + 1) * 1 + 0 //pb2
cdab 16 = ((2 * 3 + 2) * 2 + 0) * 1 + 0
cdba 17 = ((2 * 3 + 2) * 2 + 1) * 1 + 0
[...]
dcba 23 = ((3 * 3 + 2) * 2 + 1) * 1 + 0

最初の「反射」 :

エントロピーの観点。abcd は「エントロピー」が最も少ないです。キャラクターが「あるべきではない」場所にいる場合、エントロピーが作成され、エントロピーは早いほど最大になります。

たとえば、bcad の場合、辞書式インデックスは 8 = (( 1 * 3 + 1 ) * 2 + 0 ) * 1 + 0であり、次のように計算できます。

value = 0;
value += max(b - a, 0); // = 1; (a "should be" in the first place [to create the less possible entropy] but instead it is b)
value *= 3 - 0; //last index - current index
value += max(c - b, 0); // = 1; (b "should be" in the second place but instead it is c)
value *= 3 - 1;
value += max(a - c, 0); // = 0; (a "should have been" put earlier, so it does not create entropy to put it there)
value *= 3 - 2;
value += max(d - d, 0); // = 0;

最後の操作は常に何もしないことに注意してください。

最初の問題(pb1) :

たとえば、adcb の場合、最初のロジックは機能しません (((0* 3+ 2) * 2+ 0) * 1 = 4 の辞書式インデックスにつながります)。 bの前に。そのため、x を追加しました。これは、まだ配置されていない最初の数字/文字を表します。x では、diff を負にすることはできません。adcb の場合、辞書式インデックスは 5 = (( 0 * 3 + 2 ) * 2 + 1 ) * 1 + 0 であり、次のように計算できます。

value = 0; x=0;
diff = a - a; // = 0; (a is in the right place)
diff == 0 => x++; //x=b now and we don't modify value
value *= 3 - 0; //last index - current index
diff = d - b; // = 2; (b "should be" there (it's x) but instead it is d)
diff > 0 => value += diff; //we add diff to value and we don't modify x
diff = c - b; // = 1; (b "should be" there but instead it is c) This is where it differs from the first reflexion
diff > 0 => value += diff;
value *= 3 - 2;

2 番目の問題(pb2):

たとえば、cbda の場合、辞書式インデックスは 15 = ((2 * 3 + 1) * 2 + 1) * 1 + 0 ですが、最初の反射では ((2 * 3 + 0) * 2 + 1) * 1 となります。 + 0 = 13 となり、pb1 の解は ((2 * 3 + 1) * 2 + 3) * 1 + 0 = 17 になります。配置する最後の 2 文字が d と a であるため、pb1 の解は機能しません。そのため、d - 3 ではなく 1 を「意味」します。その前に配置された文字をカウントする必要がありましたが、x の後ではなく、内側のループを追加する必要がありました。

すべてをまとめると:

次に、pb1 は pb2 の特定のケースにすぎないことに気付きました。x を削除し、単純に diff = A[i] を使用すると、ソリューションのネストされていないバージョンになることに気付きました (階乗が少しずつ計算され、あなたのxに対応する私の差分）。

したがって、基本的に、私の「貢献」(私が思うに) は、変数 x を追加することです。これにより、diff が 0 または 1 に等しい場合に内部ループを実行することを回避できます。 .

また、内側のループ (if(A[j]==x+1)) で x をインクリメントする必要があるかどうかも確認しました。もう一度内側のループに入り、c に遭遇します。内側のループで x にチェックを入れると、d に遭遇したときは内側のループを行うしかありませんが、x は c に更新され、c に遭遇すると内側のループには入らなくなります。プログラムを壊さずにこのチェックを削除できます

代替バージョンと内部ループのチェックにより、4 つの異なるバージョンが作成されます。チェック付きの代替案は内側のループが少ない方が入るので、「理論的な複雑さ」という点ではベストなのですが、性能・演算数という点ではわかりません。

これがすべて役立つことを願っています（質問はかなり古く、すべての回答を詳細に読んだわけではないため）。そうでない場合でも、私はそれをやっていて楽しかったです。長い投稿で申し訳ありません。また、私は (メンバーとして) Stack Overflow の初心者であり、ネイティブスピーカーではありません。

score 0 · Accepted Answer

すでにキャッシュにあるメモリの線形トラバーサルは、実際にはまったく時間がかかりません。ご心配なく。factorial() がオーバーフローする前に十分な距離を移動することはありません。

8out をパラメーターとして移動します。

int factorial ( int input )
{
    return input ? input * factorial (input - 1) : 1;
}

int lexic_ix ( int* arr, int N )
{
    int output = 0;
    int fact = factorial (N);
    for ( int i = 0; i < N - 1; i++ )
    {
        int order = arr [ i ];
        for ( int j = 0; j < i; j++ )
            order -= arr [ j ] < arr [ i ];
        output += order * (fact /= N - i);
    }
    return output;
}

int main()
{
    int arr [ ] = { 11, 10, 9, 8, 7 , 6 , 5 , 4 , 3 , 2 , 1 , 0 };

    const int length = 12;
    for ( int i = 0; i < length; ++i )
        std::cout << lexic_ix ( arr + i, length - i  ) << std::endl;
}

score 0 · Accepted Answer

これはプロファイリングコード全体です。テストは Linux でのみ実行します。コードは G++8.4 を使用してコンパイルされ、'-std=c++11 -O3' コンパイラオプションが指定されています。公平を期すために、コードを少し書き直して、Nを事前に計算しました! それを関数に渡しますが、これはあまり役に立たないようです。

N = 9 (362,880 順列) のパフォーマンスプロファイリングは次のとおりです。

継続時間: 34、30、25 ミリ秒
継続時間: 34、30、25 ミリ秒
継続時間: 33、30、25 ミリ秒

N=10 (3,628,800 順列) のパフォーマンスプロファイリングは次のとおりです。

継続時間: 345、335、275 ミリ秒
継続時間: 348、334、275 ミリ秒
継続時間: 345、335、275 ミリ秒

最初の数字は元の関数で、2 番目の数字は N を取得するように書き直した関数です。渡された、最後の数字は私の結果です。順列生成関数は非常に原始的で実行に時間がかかりますが、すべての順列をテストデータセットとして生成する限り問題ありません。ちなみに、これらのテストは、Ubuntu 14.04 を実行するクアッドコア 3.1Ghz、4G バイトのデスクトップで実行されます。

編集:最初の関数が lexi_numbers ベクトルを展開する必要があるかもしれないという要因を忘れていたので、タイミングの前に空の呼び出しを置きました。この後、時刻は 333、334、275 です。

編集: パフォーマンスに影響を与える可能性のあるもう 1 つの要因です。コードで long integer を使用しています。これらの 2 つの「long」を 2 つの「int」に変更すると、実行時間は 334、333、264 になります。

#include <iostream>
#include <vector>
#include <chrono>
using namespace std::chrono;

int factorial(int input)
{
    return input ? input * factorial(input - 1) : 1;
}

int lexic_ix(int* arr, int N)
{
    int output = 0;
    int fact = factorial(N);
    for (int i = 0; i < N - 1; i++)
    {
        int order = arr[i];
        for (int j = 0; j < i; j++)
            order -= arr[j] < arr[i];
        output += order * (fact /= N - i);
    }
    return output;
}

int lexic_ix1(int* arr, int N, int N_fac)
{
    int output = 0;
    int fact = N_fac;
    for (int i = 0; i < N - 1; i++)
    {
        int order = arr[i];
        for (int j = 0; j < i; j++)
            order -= arr[j] < arr[i];
        output += order * (fact /= N - i);
    }
    return output;
}

int lexic_ix2( int arr[], int N , int coeff_arr[])
{
    for ( int i = 0; i < N - 1; i++ )
    {
        int order = arr [ i ];
        for ( int j = 0; j < i; j++ )
            order -= arr [ j ] < arr [ i ];
        coeff_arr[i] = order;
    }
    long fac = 1;
    long sn = 0;
    for (int i = 1; i < N; ++i)
    {
        fac *= i;
        if (coeff_arr[N - 1 - i])
            sn += coeff_arr[N - 1 - i] * fac;
    }
    return sn;
}

std::vector<std::vector<int>> gen_permutation(const std::vector<int>& permu_base)
{
    if (permu_base.size() == 1)
        return std::vector<std::vector<int>>(1, std::vector<int>(1, permu_base[0]));

    std::vector<std::vector<int>> results;
    for (int i = 0; i < permu_base.size(); ++i)
    {
        int cur_int = permu_base[i];
        std::vector<int> cur_subseq = permu_base;
        cur_subseq.erase(cur_subseq.begin() + i);
        std::vector<std::vector<int>> temp = gen_permutation(cur_subseq);
        for (auto x : temp)
        {
            x.insert(x.begin(), cur_int);
            results.push_back(x);
        }
    }
    return results;
}

int main()
{
    #define N 10
    std::vector<int> arr;
    int buff_arr[N];
    const int length = N;
    int N_fac = factorial(N);
    for(int i=0; i<N; ++i)
        arr.push_back(N-i-1); // for N=10, arr is {9, 8, 7, 6, 5, 4, 3, 2, 1, 0}
    std::vector<std::vector<int>> all_permus = gen_permutation(arr);

    std::vector<int> lexi_numbers;
    // This call is not timed, only to expand the lexi_numbers vector 
    for (auto x : all_permus)
        lexi_numbers.push_back(lexic_ix2(&x[0], length, buff_arr));

    lexi_numbers.clear();
    auto t0 = high_resolution_clock::now();
    for (auto x : all_permus)
        lexi_numbers.push_back(lexic_ix(&x[0], length));
    auto t1 = high_resolution_clock::now();
    lexi_numbers.clear();
    auto t2 = high_resolution_clock::now();
    for (auto x : all_permus)
        lexi_numbers.push_back(lexic_ix1(&x[0], length, N_fac));
    auto t3 = high_resolution_clock::now();
    lexi_numbers.clear();
    auto t4 = high_resolution_clock::now();
    for (auto x : all_permus)
        lexi_numbers.push_back(lexic_ix2(&x[0], length, buff_arr));
    auto t5 = high_resolution_clock::now();

std::cout << std::endl << "Time durations are: " << duration_cast<milliseconds> \
    (t1 -t0).count() << ", " << duration_cast<milliseconds>(t3 - t2).count() << ", " \
        << duration_cast<milliseconds>(t5 - t4).count() <<" milliseconds" << std::endl;
    return 0;
}

c++ - 辞書式インデックスを計算する最も効率的な方法

4 に答える 4

Related

Reference