cuda - デバイス関数で割り当てられたメモリをメインメモリにコピーする方法

Question

ホスト関数とデバイス関数Execute()を含む CUDA プログラムがあります。ホスト関数では、デバイス関数に渡され、デバイス関数内で割り当てられたグローバルメモリのアドレスを格納するために使用されるグローバルメモリ出力を割り当てます。ホスト関数でカーネル内に割り当てられたメモリにアクセスしたい。コードは次のとおりです。

#include <stdio.h>
typedef struct                      
{
  int             * p;            
  int              num;            
} Structure_A;

__global__ void Execute(Structure_A *output);

int main(){

    Structure_A *output;
    cudaMalloc((void***)&output,sizeof(Structure_A)*1);
    dim3 dimBlockExecute(1,1);
    dim3 dimGridExecute(1,1);
    Execute<<<dimGridExecute,dimBlockExecute>>>(output);
    Structure_A * output_cpu;
    int * p_cpu;
    cudaError_t err;

    output_cpu= (Structure_A*)malloc(sizeof(Structure_A));
    err=cudaMemcpy(output_cpu,output,sizeof(Structure_A),cudaMemcpyDeviceToHost);    
    if( err != cudaSuccess)
    {
        printf("CUDA error a: %s\n", cudaGetErrorString(err));
        exit(-1);
    }
    p_cpu=(int *)malloc(sizeof(int));
    err=cudaMemcpy(p_cpu,output_cpu[0].p,sizeof(int),cudaMemcpyDeviceToHost);    
    if( err != cudaSuccess)
    {
        printf("CUDA error b: %s\n", cudaGetErrorString(err));
        exit(-1);
    }   
    printf("output=(%d,%d)\n",output_cpu[0].num,p_cpu[0]);
    return 0;
}

__global__ void Execute(Structure_A *output){

    int thid=threadIdx.x;

    output[thid].p= (int*)malloc(thid+1);

    output[thid].num=(thid+1);

    output[thid].p[0]=5;
}

プログラムをコンパイルできます。しかし、実行すると、次のメモリコピー関数に無効な引数があることを示すエラーが発生しました。

err=cudaMemcpy(p_cpu,output_cpu[0].p,sizeof(int),cudaMemcpyDeviceToHost);

CUDA のバージョンは 4.2 です。CUDA カード: Tesla C2075 OS: x86_64 GNU/Linux

編集: コードを修正し、output_cpu と p_cpu に適切なサイズのメモリを割り当てます。

score 4 · Accepted Answer

このコードには多くの問題があります。たとえば、これらの 2 つの行の両方で 1 バイトしか割り当てていないため、の 1 つのインスタンスを保持するには不十分ですStructure_A。

output_cpu= (Structure_A*)malloc(1);
p_cpu=(int *)malloc(1);

しかし、エラーの直接の原因は、デバイスランタイムヒープに割り当てられたポインター (つまり、デバイスコードと共に、mallocまたはnewデバイスコード内で割り当てられた) からホストポインターへの memcpy を実行していることです。

err=cudaMemcpy(p_cpu,output_cpu[0].p,sizeof(int),cudaMemcpyDeviceToHost);

残念ながら、現在、cudaMalloc、cudaFree、および cudaMemcpy のホストランタイム API は、デバイスランタイムヒープに割り当てられたメモリと互換性がありません。

cuda - デバイス関数で割り当てられたメモリをメイン メモリにコピーする方法

1 に答える 1

Related

Reference

cuda - デバイス関数で割り当てられたメモリをメインメモリにコピーする方法