でコンパイルするとgcc -O3
、次のループが (自動的に) ベクトル化されないのはなぜですか?
#define SIZE (65536)
int a[SIZE], b[SIZE], c[SIZE];
int foo () {
int i, j;
for (i=0; i<SIZE; i++){
for (j=i; j<SIZE; j++) {
a[i] = b[i] > c[j] ? b[i] : c[j];
}
}
return a[0];
}
次のものはいつですか?
#define SIZE (65536)
int a[SIZE], b[SIZE], c[SIZE];
int foov () {
int i, j;
for (i=0; i<SIZE; i++){
for (j=i; j<SIZE; j++) {
a[i] += b[i] > c[j] ? b[i] : c[j];
}
}
return a[0];
}
唯一の違いは、内側のループの式の結果がa[i] に割り当てられるか、a[i] に追加されるかです。
参考-ftree-vectorizer-verbose=6
までに、最初の (ベクトル化されていない) ループの次の出力を示します。
v.c:8: note: not vectorized: inner-loop count not invariant.
v.c:9: note: Unknown alignment for access: c
v.c:9: note: Alignment of access forced using peeling.
v.c:9: note: not vectorized: live stmt not supported: D.2700_5 = c[j_20];
v.c:5: note: vectorized 0 loops in function.
ベクトル化するループの同じ出力は次のとおりです。
v.c:8: note: not vectorized: inner-loop count not invariant.
v.c:9: note: Unknown alignment for access: c
v.c:9: note: Alignment of access forced using peeling.
v.c:9: note: vect_model_load_cost: aligned.
v.c:9: note: vect_model_load_cost: inside_cost = 1, outside_cost = 0 .
v.c:9: note: vect_model_simple_cost: inside_cost = 1, outside_cost = 1 .
v.c:9: note: vect_model_reduction_cost: inside_cost = 1, outside_cost = 6 .
v.c:9: note: cost model: prologue peel iters set to vf/2.
v.c:9: note: cost model: epilogue peel iters set to vf/2 because peeling for alignment is unknown .
v.c:9: note: Cost model analysis:
Vector inside of loop cost: 3
Vector outside of loop cost: 27
Scalar iteration cost: 3
Scalar outside cost: 7
prologue iterations: 2
epilogue iterations: 2
Calculated minimum iters for profitability: 8
v.c:9: note: Profitability threshold = 7
v.c:9: note: Profitability threshold is 7 loop iterations.
v.c:9: note: LOOP VECTORIZED.
v.c:5: note: vectorized 1 loops in function.