3

gcc auto-vectorize documentationから例 4 の単純化されたバージョンをベクトル化しようとしています。私の人生では、それを行う方法がわかりません。

typedef int aint __attribute__ ((__aligned__(16)));
void foo1 (int n, aint * restrict px, aint *restrict qx) {

  /* feature: support for (aligned) pointer accesses.  */
  int *__restrict p = __builtin_assume_aligned (px, 16);
  int *__restrict q = __builtin_assume_aligned (qx, 16);

  while (n--){
    //*p++ += *q++; <- this is vectorized                                                                                                                                                                   
    p[n] += q[n]; // This isn't!                                                                                                                                                                            
  }
}

gcc -o apps/craft_dbsplit.o -c -Wall -g -ggdb -O3 -msse2 -funsafe-math-optimizations -ffast-math -ftree-vectorize -ftree-vectorizer-verbose= で gcc 4.7.2 を実行しています。 5 -funsafe-loop-optimizations -std=c99

そして、次のように応答します。

Analyzing loop at apps/craft_dbsplit.c:388

388: dependence distance  = 0.
388: dependence distance == 0 between *D.9363_14 and *D.9363_14
388: dependence distance  = 0.
388: accesses have the same alignment.
388: dependence distance modulo vf == 0 between *D.9363_14 and *D.9363_14
388: vect_model_load_cost: unaligned supported by hardware.
388: vect_get_data_access_cost: inside_cost = 2, outside_cost = 0.
388: vect_model_store_cost: unaligned supported by hardware.
388: vect_get_data_access_cost: inside_cost = 2, outside_cost = 0.
388: Alignment of access forced using peeling.
388: Vectorizing an unaligned access.
388: vect_model_load_cost: aligned.
388: vect_model_load_cost: inside_cost = 1, outside_cost = 0 .
388: vect_model_load_cost: unaligned supported by hardware.
388: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .
388: vect_model_simple_cost: inside_cost = 1, outside_cost = 0 .
388: not vectorized: relevant stmt not supported: *D.9363_14 = D.9367_20;

apps/craft_dbsplit.c:382: note: vectorized 0 loops in function.
4

1 に答える 1