21

PADDD(つまり、_mm_add_epi32組み込み)などのSSE2命令を使用する場合、いずれかの操作がオーバーフローしたかどうかを確認する方法はありますか?

オーバーフロー後にMXCSR制御レジスタのフラグが設定されるのではないかと思いましたが、そうはいきません。たとえば、_mm_getcsr()以下の両方の場合で同じ値を出力します(8064)。

#include <iostream>
#include <emmintrin.h>

using namespace std;

void main()
{
    __m128i a = _mm_set_epi32(1, 0, 0, 0);
    __m128i b = _mm_add_epi32(a, a);
    cout << "MXCSR:  " << _mm_getcsr() << endl;
    cout << "Result: " << b.m128i_i32[3] << endl;

    __m128i c = _mm_set_epi32((1<<31)-1, 3, 2, 1);
    __m128i d = _mm_add_epi32(c, c);
    cout << "MXCSR:  " << _mm_getcsr() << endl;
    cout << "Result: " << d.m128i_i32[3] << endl;
}

SSE2のオーバーフローをチェックする他の方法はありますか?

4

4 に答える 4

13

Here is a somewhat more efficient version of @hirschhornsalz's sum_and_overflow function:

void sum_and_overflow(__v4si a, __v4si b, __v4si& sum, __v4si& overflow)
{
   __v4si sa, sb;

    sum = _mm_add_epi32(a, b);                  // calculate sum
    sa = _mm_xor_si128(sum, a);                 // compare sign of sum with sign of a
    sb = _mm_xor_si128(sum, b);                 // compare sign of sum with sign of b
    overflow = _mm_and_si128(sa, sb);           // get overflow in sign bit
    overflow = _mm_srai_epi32(overflow, 31);    // convert to SIMD boolean (-1 == TRUE, 0 == FALSE)
}

It uses an expression for overflow detection from Hacker's Delight page 27:

sum = a + b;
overflow = (sum ^ a) & (sum ^ b);               // overflow flag in sign bit

Note that the overflow vector will contain the more conventional SIMD boolean values of -1 for TRUE (overflow) and 0 for FALSE (no overflow). If you only need the overflow in the sign bit and the other bits are "don't care" then you can omit the last line of the function, reducing the number of SIMD instructions from 5 to 4.

NB: this solution, as well as the previous solution on which it is based are for signed integer values. A solution for unsigned values will require a slightly different approach (see @Stephen Canon's answer).

于 2012-05-09T11:42:00.460 に答える
9

Since you have 4 possible overflows, the control register would very quickly run out of bits, especially, if you wanted carrys, sign etc. and that even for a vector addition consisting of 16 bytes :-)

The overflow flag is set, if the input sign bits are both equal and the result sign bit is different than a input sign bit.

This functions calculates sum = a+b and overflow manually. For every overflow 0x80000000 is returend in overflow.

void sum_and_overflow(__v4si a, __v4si b, __v4si& sum, __v4si& overflow) {
    __v4si signmask = _mm_set1_epi32(0x80000000);
    sum = a+b;
    a &= signmask;
    b &= signmask;
    overflow = sum & signmask;
    overflow = ~(a^b) & (overflow^a); // overflow is 1 if (a==b) and (resultbit has changed)
}

Note: If you don't have gcc, you have to replace the ^ & + operators by the appropriate SSE intrinsics, like _mm_and_si128(), _mm_add_epi32() etc.

Edit: I just noticed the and with the mask can of course be done at the very end of the function, saving two and operations. But the compiler will very likely be smart enough to do it by itself.

于 2012-05-09T09:17:23.393 に答える
6

I notice you asked for a solution for unsigned as well; fortunately, that's pretty easy too:

__v4si mask = _mm_set1_epi32(0x80000000);
sum = _mm_add_epi32(a, b);
overflow = _mm_cmpgt_epi32(_mm_xor_si128(mask, a), _mm_xor_si128(mask, sum));

Normally to detect unsigned overflow, you simply check either sum < a or sum < b. However, SSE does not have unsigned comparisons; xor-ing the arguments with 0x80000000 lets you use a signed comparison to get the same result.

于 2012-05-13T13:07:50.407 に答える
2

基になるPADDD命令はフラグに影響を与えません。

したがって、これをテストするには、実行する内容に応じて、追加のコードを作成する必要があります。

注:epi32intrisicsの欠如によって少し妨げられています

于 2012-05-09T08:18:37.937 に答える