c++ - -arch=sm_20 でも Cuda Hello World printf が機能しない

Question

私は自分が Cuda の完全な初心者だとは思っていませんでしたが、どうやらそうです。

最近、cuda デバイスを 1.3 から 2.1 (Geforce GT 630) にアップグレードしました。Cuda toolkit 5.0 へのフルアップグレードも考えました。

一般的な cuda カーネルはコンパイルできますが、-arch=sm_20 を設定しても printf が動作しません。

コード：

#include <stdio.h>
#include <assert.h>
#include <cuda.h>
#include <cuda_runtime.h>

__global__ void test(){

    printf("Hi Cuda World");
}

int main( int argc, char** argv )
{

    test<<<1,1>>>();
        return 0;
}

コンパイラ：

Error   2   error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\bin\nvcc.exe" -gencode=arch=compute_10,code=\"sm_20,compute_10\" --use-local-env --cl-version 2010 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin"  -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include"  -G   --keep-dir "Debug" -maxrregcount=0  --machine 32 --compile -arch=sm_20  -g   -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /Zi /RTC1 /MDd  " -o "Debug\main.cu.obj" "d:\userstore\documents\visual studio 2010\Projects\testCuda\testCuda\main.cu"" exited with code 2.  C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\BuildCustomizations\CUDA 5.0.targets  592 10  testCuda
Error   1   error : calling a __host__ function("printf") from a __global__ function("test") is not allowed d:\userstore\documents\visual studio 2010\Projects\testCuda\testCuda\main.cu    9   1   testCuda

この問題のために、私は人生を終えようとしています...完了完了。屋上から答えてください。

score 28 · Accepted Answer

printfカーネルで使用している場合は、次を使用する必要がありますcudaDeviceSynchronize()。

#include <stdio.h>
#include <assert.h>
#include <cuda.h>
#include <cuda_runtime.h>

__global__ void test(){
    printf("Hi Cuda World");
}

int main( int argc, char** argv )
{
    test<<<1,1>>>();
    cudaDeviceSynchronize();
    return 0;
}

score 11 · Accepted Answer

In kernel printf is only supported in compute capability 2 or higher hardware. Because your project is set to build for both compute capability 1.0 and compute 2.1, nvcc compiles the code multiple times and builds a multi-architecture fatbinary object. It is during the compute capability 1.0 compilation cycle that the error is being generated, because the printf call is unsupported for that architecture.

If you remove the compute capability 1.0 build target from your project, the error will disappear.

You could alternatively, write the kernel like this:

__global__ void test()
{
#if __CUDA_ARCH__ >= 200
    printf("Hi Cuda World");
#endif
}

The __CUDA_ARCH__ symbol will only be >= 200 when building for compute capability 2.0 or high targets and this would allow you to compile this code for compute capability 1.x devices without encountering a syntax error.

When compiling for the correct architecture and getting no output, you also need to ensure that the kernel finishes and the driver flushes the output buffer. To do this add a synchronizing call after the kernel launch in the host code

for example:

int main( int argc, char** argv )
{

    test<<<1,1>>>();
    cudaDeviceSynchronize();
    return 0;
}

[disclaimer: all code written in browser, never compiled, use at own risk]

If you do both things, you should be able to compile, run and see output.

c++ - -arch=sm_20 でも Cuda Hello World printf が機能しない

3 に答える 3

Related

Reference