cuda - pycudaのグローバル関数からデバイス関数を呼び出す

Question

私はPyCUDAの初心者です。__device__で宣言された関数からで宣言された関数を呼び出したい__global__。pyCUDAでこれを行うにはどうすればよいですか？

import pycuda.driver as cuda  
from pycuda.compiler import SourceModule  
import numpy as n  
import pycuda.autoinit  
import pycuda.gpuarray as gp

d=gp.zeros(shape=(128,128),dtype=n.int32)  
h=n.zeros(shape=(128,128),dtype=n.int32)  
mod=SourceModule("""  
      __global__ void  matAdd(int *a)  
    {  
            int px=blockIdx.x*blockDim.x+threadIdx.x;  
            int py=blockIdx.y*blockDim.y+threadIdx.y;         
            a[px*128+py]+=1;   
            matMul(px);

    }  
      __device__ void matMul( int px)
    {
      px=5;
    }  

""")

m=mod.get_function("matAdd")  
m(d,block=(32,32,1),grid=(4,4))  
d.get(h)

上記のコードは私に次のエラーを与えています

7-linux-i686.egg/pycuda/../include/pycuda kernel.cu]  
[stderr:  
kernel.cu(8): error: identifier "matMul" is undefined  

kernel.cu(12): warning: parameter "px" was set but never used  

1 error detected in the compilation of "/tmp/tmpxft_00002286_00000000-6_kernel.cpp1.ii".  
]

score 1 · Accepted Answer

matMul関数を参照する前に、関数を宣言する必要があります。あなたはこのようにそれを行うことができます：

  __device__ void matMul( int px); // declaration
  __global__ void  matAdd(int *a)  
{  
        int px=blockIdx.x*blockDim.x+threadIdx.x;  
        int py=blockIdx.y*blockDim.y+threadIdx.y;         
        a[px*128+py]+=1;   
        matMul(px);

}  
  __device__ void matMul( int px) // implementation
{
  px=5; // by the way, this assignment does not propagate outside this function
}

matMul、または関数全体を前に移動しますmatAdd。

cuda - pycudaのグローバル関数からデバイス関数を呼び出す

1 に答える 1

Related

Reference