c - gcd計算の最適化

Question

私は符号付き整数の最大公約数の使用に大きく基づいたアルゴリズムを持っているので、それをより速く取得しようとしています。すでにパフォーマンスが50％向上しているので、このような最適化は不要だと言わないでください。最適化の可能性は残っていませんが、おそらく間違っています。

次のコードでは、b！=0であると断言されています。

Integer gcd(Integer a, Integer b)
{
   if ( a == 0 )
   {
      return b;
   }
   if ( a == b )
   {
      return a;
   }
   const Integer mask_a = (a >> (sizeof(Integer) * 8 - 1));
   a = (a + mask_a) ^ mask_a; // a = |a|
   const Integer mask_b = (b >> (sizeof(Integer) * 8 - 1));
   b = (b + mask_b) ^ mask_b; // b = |b|
   if ( ~a & 1 )
   {
      if ( b & 1 )
      {
         // 2 divides a but not b
         return gcd(a >> 1, b);
      }
      else
      {
         // both a and b are divisible by 2, thus gcd(a, b) == 2 * gcd( a / 2, b / 2)
         return gcd(a >> 1, b >> 1) << 1;
      }
   }
   if ( ~b & 1 )
   {
      // 2 divides b, but not a
      return gcd(a, b >> 1);
   }
   if ( a > b )
   {
      // since both a and b are odd, a - b is divisible by 2
      // gcd(a, b) == gcd( a - b, b)
      return gcd((a - b) >> 1, b);
   }
   else
   {
      // since both a and b are odd, a - b is divisible by 2
      // gcd(a, b) == gcd( a - b, b)
      return gcd((b - a) >> 1, a);
   }
}

補足として：整数は、、またはのいずれintかlongですlong long。それは上のどこかのtypedefです。

ご覧のとおり、aとbをそれぞれの絶対値にすることで最適化が行われ、私のマシンでは正常に機能します（必ずしもすべてのafaikで機能するとは限りません）。私はちょっと枝分かれした混乱が嫌いです。それを改善する方法はありますか？

score 3 · Accepted Answer

別の変形、シフトのルックアップテーブルを備えたバイナリGCD、

typedef unsigned /* long long int */ UInteger;

Integer lookup(Integer a, Integer b) {
    static const int lut[] =
        { 8, 0, 1, 0, 2, 0, 1, 0
        , 3, 0, 1, 0, 2, 0, 1, 0
        , 4, 0, 1, 0, 2, 0, 1, 0
        , 3, 0, 1, 0, 2, 0, 1, 0
        , 5, 0, 1, 0, 2, 0, 1, 0
        , 3, 0, 1, 0, 2, 0, 1, 0
        , 4, 0, 1, 0, 2, 0, 1, 0
        , 3, 0, 1, 0, 2, 0, 1, 0
        , 6, 0, 1, 0, 2, 0, 1, 0
        , 3, 0, 1, 0, 2, 0, 1, 0
        , 4, 0, 1, 0, 2, 0, 1, 0
        , 3, 0, 1, 0, 2, 0, 1, 0
        , 5, 0, 1, 0, 2, 0, 1, 0
        , 3, 0, 1, 0, 2, 0, 1, 0
        , 4, 0, 1, 0, 2, 0, 1, 0
        , 3, 0, 1, 0, 2, 0, 1, 0
        , 7, 0, 1, 0, 2, 0, 1, 0
        , 3, 0, 1, 0, 2, 0, 1, 0
        , 4, 0, 1, 0, 2, 0, 1, 0
        , 3, 0, 1, 0, 2, 0, 1, 0
        , 5, 0, 1, 0, 2, 0, 1, 0
        , 3, 0, 1, 0, 2, 0, 1, 0
        , 4, 0, 1, 0, 2, 0, 1, 0
        , 3, 0, 1, 0, 2, 0, 1, 0
        , 6, 0, 1, 0, 2, 0, 1, 0
        , 3, 0, 1, 0, 2, 0, 1, 0
        , 4, 0, 1, 0, 2, 0, 1, 0
        , 3, 0, 1, 0, 2, 0, 1, 0
        , 5, 0, 1, 0, 2, 0, 1, 0
        , 3, 0, 1, 0, 2, 0, 1, 0
        , 4, 0, 1, 0, 2, 0, 1, 0
        , 3, 0, 1, 0, 2, 0, 1, 0
        };
    if (a == 0) return b;
    const Integer mask_a = a >> (sizeof(Integer) * 8 - 1);
    UInteger a1 = (a + mask_a) ^ mask_a;
    const Integer mask_b = b >> (sizeof(Integer) * 8 - 1);
    UInteger b1 = (b + mask_b) ^ mask_b;

    int sa = 0, sb = 0, shift;
    do {
        shift = lut[a1 & 0xFF];
        a1 >>= shift;
        sa += shift;
    }while(shift == 8);
    do {
        shift = lut[b1 & 0xFF];
        b1 >>= shift;
        sb += shift;
    }while(shift == 8);
    // sa holds the amount to shift the result by, the smaller of the trailing zeros counts
    if (sa > sb) sa = sb;
    // now a1 and b1 are both odd
    if (a1 < b1) {
        UInteger tmp = a1;
        a1 = b1;
        b1 = tmp;
    }
    while(b1 > 1) {
        do {
            a1 -= b1;
            if (a1 == 0){
                return (b1 << sa);
            }
            do {
                a1 >>= (shift = lut[a1 & 0xFF]);
            }while(shift == 8);
        }while(b1 < a1);
        do {
            b1 -= a1;
            if (b1 == 0){
                return (a1 << sa);
            }
            do {
                b1 >>= (shift = lut[b1 & 0xFF]);
            }while(shift == 8);
        }while(a1 < b1);
    }
    return (1 << sa);
}

私のボックスでは、これはユークリッドアルゴリズムよりも少し高速です。ユークリッドアルゴリズムは、再帰的なバイナリGCDやシフトのルックアップテーブルがないアルゴリズムよりもはるかに高速です。16要素のルックアップテーブルを備えたバージョンは、Euclidよりもわずかに低速でした。

aただし、コメントで述べたように、入力が非常に小さい場合は、GCD自体のルックアップテーブルを計算する方が速い場合があります。少なくとも小さい場合はb（たとえば0 <= a,b <= 50）、入力が次の場合にのみ計算にフォールバックします。ルックアップテーブルで許可されているよりも大きい。

score 2 · Accepted Answer

私は次のようないくつかのテストと単純なアルゴリズムを実行しました：

int gcd(int a,int b)
{
    while(1)
    {
        int c = a%b;
        if(c==0)
          return abs(b);
        a = b;
        b = c;
    }
}

コードよりも約5倍高速です。

これを試してください：

Integer gcd( Integer a, Integer b )
{
    if( a == b )
        return a;

    if( a == 0 )
        return b;

    if( b == 0 )
        return a;

    while( b != 0 ) { 
        Integer t = b;
        b = a % b;
        a = t;
    }

    return a;
}

冒頭の@unkulunkuluのコードとIFです。入力に些細なケースがたくさんある場合は、より高速になります。残念ながら、その場合、コードよりもはるかに高速になることはなく、それについて多くのことを行うことはできません。

score 1 · Accepted Answer

再帰をループに置き換えることで、これを最適化できます。コンパイラが末尾再帰呼び出しをジャンプに最適化した場合でも、再帰呼び出しで不要なチェックa == 0と絶対値の計算を行うことになります。

非末尾再帰を処理するにgcd(a >> 1, b >> 1) << 1は、ゼロから始まり、の前の結果を左シフトするために使用されるアキュムレータ変数を導入する必要がありますreturn。

例（未テスト）：

Integer gcd(Integer a, Integer b)
{
    const Integer mask_a = a >> (sizeof(Integer) * 8 - 1);
    a = (a + mask_a) ^ mask_a;
    const Integer mask_b = b >> (sizeof(Integer) * 8 - 1);
    b = (b + mask_b) ^ mask_b;

    int shift = 0;
    while (a != 0 && a != b) {
        if (~a & 1) {
            a >>= 1;
            if (!(b & 1)) {
                b >>= 1;
                shift++;
            }
        } else if (~b & 1) {
            b >>= 1;
        } else if (a > b) {
            a = (a - b) >> 1;
        } else {
            b = (b - a) >> 1; // the error was here and i have to write 6 chars about it, formerly it was a = (b - a) >> 1;
        }
    }
    return b << shift;
}

score 1 · Accepted Answer

コンパイラがそれをサポートしている場合は、コンパイラがさらに最適化できるようにする、likely()またはunlikely() それぞれの組み込み関数を使用できます。if()

これらの関数はLinuxカーネルで定義されています。gccでは、に置き換えられ__builtin_expect()ます。

c - gcd計算の最適化

4 に答える 4

Related

Reference