cuda - Should we place macros in CUDA outside or inside of the global function?

Question

My CUDA kernel looks like this.

#define MY_AWESOME_MACRO(foo, bar) (foo * bar * 123 + 456)
__global__ void my_CUDA_kernel(int* cool, float* beans) {
    // Some computation.
}

Should I place my macro inside or outside of the function? I Googled around, and some did both. Is there harm in doing it one way or the other?

score 2 · Accepted Answer

概念的には、違いはありません。ファイル内のどこにでもマクロを定義できます。

Compute Visual Profilerを使用したとき、外部でマクロが定義されているコードは、他のコードよりも高速に実行されました。両方にプロファイラーを使用し、どちらが要件に適しているかを確認することをお勧めします。

cuda - Should we place macros in CUDA outside or inside of the __global__ function?