python - CUDA: Unaligned Memory Access Not Supported: What am I missing?

Question

There are a few questions similar to this but in this case, its a bit weird; NVCC 3.1 doesn't like this but 3.2 and 4.0RC do;

float xtmp[MAT1];

for (i=0; i<MAT1; i++){
    xtmp[i]=x[p[i]]; //value that should be here
}

Where p is passed by reference to the function (int *p) coming from...

int p_pivot[MAT1],q_pivot[MAT1];

To add a bit of context, before the p's get to the 'top' function, they are populated by (I'm cutting out as much irrelevant code as i can for clarity)

...
for (i=0;i<MAT1;i++){
    ...
    p_pivot[i]=q_pivot[i]=i
    ...
}
...

Beyond that the only operations on pivot are 3-step-swaps with integer temporary values.

After all that p_pivot is passed to the 'top' function by (&p_pivot[0])

For anyone looking for more detail, the code is here and the only change that should be needed to flip between 3.2/4.0 to earlier is to change the cudaDeviceSynchronise(); to cudaThreadSynchronize();. This is my dirty dirty experimental code so please don't judge me! :D

As noted, all of the above works fine in higher versions of NVCC, and I'm working to get those put onto the machine in question, but I'd be interested to see what I'm missing.

It must be the array-lookup indexing that's causing the issue, but I don't understand why?

score 2 · Accepted Answer

それは私にはコンパイラのバグのように見えます。これは、64 ビットプラットフォームの nvcc 3.1 で動作します。

float xtmp[MAT1];
//Swap rows (x=Px)
for (i=0; i<MAT1; i++){
    int idx = p[i];
    xtmp[i]=x[idx]; //value that should be here
}

私の推測では、暗黙的な int から size_t への変換の何かが壊れていると思います。私が試したCUDAの新しいバージョンでは失敗しません。

python - CUDA: Unaligned Memory Access Not Supported: What am I missing?

1 に答える 1

Related

Reference