build-automation - cmakeからcuda対応GPUの存在をテストする最も簡単な方法は？

Question

cudaライブラリがインストールされているが、cuda対応のGPUがインストールされていないナイトリービルドマシンがいくつかあります。これらのマシンは、cuda対応のプログラムを構築できますが、これらのプログラムを実行することはできません。

自動化された夜間ビルドプロセスでは、cmakeスクリプトはcmakeコマンドを使用します

find_package(CUDA)

cudaソフトウェアがインストールされているかどうかを確認します。CUDA_FOUNDこれにより、 cudaソフトウェアがインストールされているプラットフォームにcmake変数が設定されます。これは素晴らしく、完璧に機能します。が設定されている場合CUDA_FOUND、cuda対応プログラムをビルドしても問題ありません。マシンにcuda対応のGPUがない場合でも。

しかし、cudaを使用するテストプログラムは、GPU以外のcudaマシンでは当然失敗し、夜間のダッシュボードが「ダーティ」に見える原因になります。だから私はcmakeがそのようなマシンでそれらのテストを実行するのを避けたいです。しかし、私はまだそれらのマシンでcudaソフトウェアを構築したいと思っています。

肯定的なCUDA_FOUND結果が得られたら、実際のGPUの存在をテストし、CUDA_GPU_FOUNDこれを反映するように変数を設定します。

cmakeにcuda対応のGPUの存在をテストさせる最も簡単な方法は何ですか？

これは、MSVCを搭載したWindows、Mac、およびLinuxの3つのプラットフォームで機能する必要があります。（それが私たちが最初にcmakeを使用する理由です）

編集： GPUの存在をテストするプログラムを作成する方法についての回答には、見栄えの良い提案がいくつかあります。まだ欠けているのは、CMakeに構成時にこのプログラムをコンパイルして実行させる手段です。ここではCMakeのTRY_RUNコマンドが重要になると思いますが、残念ながらそのコマンドはほとんど文書化されておらず、どのように機能させるかがわかりません。問題のこのCMakeの部分は、はるかに難しい質問かもしれません。おそらく私はこれを2つの別々の質問として尋ねるべきでした...

score 21 · Accepted Answer

この質問に対する答えは、次の2つの部分で構成されています。

cuda対応GPUの存在を検出するプログラム。
構成時にそのプログラムの結果をコンパイル、実行、および解釈するためのCMakeコード。

パート1のGPUスニッフィングプログラムでは、非常にコンパクトであるため、fabrizioMから提供された回答から始めました。うまく機能させるには、unknownの回答にある詳細の多くが必要であることにすぐに気付きました。私が最終的に得たのは、私が名前を付けた次のCソースファイルですhas_cuda_gpu.c。

#include <stdio.h>
#include <cuda_runtime.h>

int main() {
    int deviceCount, device;
    int gpuDeviceCount = 0;
    struct cudaDeviceProp properties;
    cudaError_t cudaResultCode = cudaGetDeviceCount(&deviceCount);
    if (cudaResultCode != cudaSuccess) 
        deviceCount = 0;
    /* machines with no GPUs can still report one emulation device */
    for (device = 0; device < deviceCount; ++device) {
        cudaGetDeviceProperties(&properties, device);
        if (properties.major != 9999) /* 9999 means emulation only */
            ++gpuDeviceCount;
    }
    printf("%d GPU CUDA device(s) found\n", gpuDeviceCount);

    /* don't just return the number of gpus, because other runtime cuda
       errors can also yield non-zero return values */
    if (gpuDeviceCount > 0)
        return 0; /* success */
    else
        return 1; /* failure */
}

cuda対応のGPUが見つかった場合、戻りコードはゼロであることに注意してください。これは、私のhas-cuda-but-no-GPUマシンの1つで、このプログラムがゼロ以外の終了コードでランタイムエラーを生成するためです。したがって、ゼロ以外の終了コードは、「cudaはこのマシンでは機能しません」と解釈されます。

非GPUマシンでcudaエミュレーションモードを使用しない理由を尋ねられるかもしれません。エミュレーションモードにバグがあるためです。コードをデバッグし、cudaGPUコードのバグを回避したいだけです。エミュレータをデバッグする時間がありません。

問題の2番目の部分は、このテストプログラムを使用するためのcmakeコードです。いくつかの苦労の後、私はそれを理解しました。次のブロックは、より大きなCMakeLists.txtファイルの一部です。

find_package(CUDA)
if(CUDA_FOUND)
    try_run(RUN_RESULT_VAR COMPILE_RESULT_VAR
        ${CMAKE_BINARY_DIR} 
        ${CMAKE_CURRENT_SOURCE_DIR}/has_cuda_gpu.c
        CMAKE_FLAGS 
            -DINCLUDE_DIRECTORIES:STRING=${CUDA_TOOLKIT_INCLUDE}
            -DLINK_LIBRARIES:STRING=${CUDA_CUDART_LIBRARY}
        COMPILE_OUTPUT_VARIABLE COMPILE_OUTPUT_VAR
        RUN_OUTPUT_VARIABLE RUN_OUTPUT_VAR)
    message("${RUN_OUTPUT_VAR}") # Display number of GPUs found
    # COMPILE_RESULT_VAR is TRUE when compile succeeds
    # RUN_RESULT_VAR is zero when a GPU is found
    if(COMPILE_RESULT_VAR AND NOT RUN_RESULT_VAR)
        set(CUDA_HAVE_GPU TRUE CACHE BOOL "Whether CUDA-capable GPU is present")
    else()
        set(CUDA_HAVE_GPU FALSE CACHE BOOL "Whether CUDA-capable GPU is present")
    endif()
endif(CUDA_FOUND)

これによりCUDA_HAVE_GPU、cmakeにブール変数が設定され、後で条件付き操作をトリガーするために使用できます。

インクルードパラメーターとリンクパラメーターをCMAKE_FLAGSスタンザに入れる必要があること、および構文を理解するのに長い時間がかかりました。try_runのドキュメントは非常に軽量ですが、密接に関連するコマンドであるtry_compileのドキュメントに詳細があります。これを機能させる前に、try_compileとtry_runの例をWebで検索する必要がありました。

もう1つのトリッキーですが重要な詳細はtry_run、「bindir」の3番目の引数です。おそらく常にこれをに設定する必要があり${CMAKE_BINARY_DIR}ます。特に、${CMAKE_CURRENT_BINARY_DIR}プロジェクトのサブディレクトリにいる場合は、に設定しないでください。CMakeはCMakeFiles/CMakeTmpbindir内でサブディレクトリを見つけることを期待しており、そのディレクトリが存在しない場合はエラーを吐き出します。${CMAKE_BINARY_DIR}これらのサブディレクトリが自然に存在するように見える場所の1つであるを使用してください。

score 9 · Accepted Answer

次のような簡単なプログラムを作成します

#include<cuda.h>

int main (){
    int deviceCount;
    cudaError_t e = cudaGetDeviceCount(&deviceCount);
    return e == cudaSuccess ? deviceCount : -1;
}

戻り値を確認してください。

score 4 · Accepted Answer

必要と思われることのいくつかを実行する純粋なPythonスクリプトを作成しました（これの多くはpystreamプロジェクトから取得しました）。これは基本的に、CUDAランタイムライブラリの一部の関数の単なるラッパーです（ctypesを使用します）。main（）関数を見て、使用例を確認してください。また、私が書いたばかりなので、バグが含まれている可能性があることに注意してください。注意して使用してください。

#!/bin/bash

import sys
import platform
import ctypes

"""
cudart.py: used to access pars of the CUDA runtime library.
Most of this code was lifted from the pystream project (it's BSD licensed):
http://code.google.com/p/pystream

Note that this is likely to only work with CUDA 2.3
To extend to other versions, you may need to edit the DeviceProp Class
"""

cudaSuccess = 0
errorDict = {
    1: 'MissingConfigurationError',
    2: 'MemoryAllocationError',
    3: 'InitializationError',
    4: 'LaunchFailureError',
    5: 'PriorLaunchFailureError',
    6: 'LaunchTimeoutError',
    7: 'LaunchOutOfResourcesError',
    8: 'InvalidDeviceFunctionError',
    9: 'InvalidConfigurationError',
    10: 'InvalidDeviceError',
    11: 'InvalidValueError',
    12: 'InvalidPitchValueError',
    13: 'InvalidSymbolError',
    14: 'MapBufferObjectFailedError',
    15: 'UnmapBufferObjectFailedError',
    16: 'InvalidHostPointerError',
    17: 'InvalidDevicePointerError',
    18: 'InvalidTextureError',
    19: 'InvalidTextureBindingError',
    20: 'InvalidChannelDescriptorError',
    21: 'InvalidMemcpyDirectionError',
    22: 'AddressOfConstantError',
    23: 'TextureFetchFailedError',
    24: 'TextureNotBoundError',
    25: 'SynchronizationError',
    26: 'InvalidFilterSettingError',
    27: 'InvalidNormSettingError',
    28: 'MixedDeviceExecutionError',
    29: 'CudartUnloadingError',
    30: 'UnknownError',
    31: 'NotYetImplementedError',
    32: 'MemoryValueTooLargeError',
    33: 'InvalidResourceHandleError',
    34: 'NotReadyError',
    0x7f: 'StartupFailureError',
    10000: 'ApiFailureBaseError'}


try:
    if platform.system() == "Microsoft":
        _libcudart = ctypes.windll.LoadLibrary('cudart.dll')
    elif platform.system()=="Darwin":
        _libcudart = ctypes.cdll.LoadLibrary('libcudart.dylib')
    else:
        _libcudart = ctypes.cdll.LoadLibrary('libcudart.so')
    _libcudart_error = None
except OSError, e:
    _libcudart_error = e
    _libcudart = None

def _checkCudaStatus(status):
    if status != cudaSuccess:
        eClassString = errorDict[status]
        # Get the class by name from the top level of this module
        eClass = globals()[eClassString]
        raise eClass()

def _checkDeviceNumber(device):
    assert isinstance(device, int), "device number must be an int"
    assert device >= 0, "device number must be greater than 0"
    assert device < 2**8-1, "device number must be < 255"


# cudaDeviceProp
class DeviceProp(ctypes.Structure):
    _fields_ = [
         ("name", 256*ctypes.c_char), #  < ASCII string identifying device
         ("totalGlobalMem", ctypes.c_size_t), #  < Global memory available on device in bytes
         ("sharedMemPerBlock", ctypes.c_size_t), #  < Shared memory available per block in bytes
         ("regsPerBlock", ctypes.c_int), #  < 32-bit registers available per block
         ("warpSize", ctypes.c_int), #  < Warp size in threads
         ("memPitch", ctypes.c_size_t), #  < Maximum pitch in bytes allowed by memory copies
         ("maxThreadsPerBlock", ctypes.c_int), #  < Maximum number of threads per block
         ("maxThreadsDim", 3*ctypes.c_int), #  < Maximum size of each dimension of a block
         ("maxGridSize", 3*ctypes.c_int), #  < Maximum size of each dimension of a grid
         ("clockRate", ctypes.c_int), #  < Clock frequency in kilohertz
         ("totalConstMem", ctypes.c_size_t), #  < Constant memory available on device in bytes
         ("major", ctypes.c_int), #  < Major compute capability
         ("minor", ctypes.c_int), #  < Minor compute capability
         ("textureAlignment", ctypes.c_size_t), #  < Alignment requirement for textures
         ("deviceOverlap", ctypes.c_int), #  < Device can concurrently copy memory and execute a kernel
         ("multiProcessorCount", ctypes.c_int), #  < Number of multiprocessors on device
         ("kernelExecTimeoutEnabled", ctypes.c_int), #  < Specified whether there is a run time limit on kernels
         ("integrated", ctypes.c_int), #  < Device is integrated as opposed to discrete
         ("canMapHostMemory", ctypes.c_int), #  < Device can map host memory with cudaHostAlloc/cudaHostGetDevicePointer
         ("computeMode", ctypes.c_int), #  < Compute mode (See ::cudaComputeMode)
         ("__cudaReserved", 36*ctypes.c_int),
]

    def __str__(self):
        return """NVidia GPU Specifications:
    Name: %s
    Total global mem: %i
    Shared mem per block: %i
    Registers per block: %i
    Warp size: %i
    Mem pitch: %i
    Max threads per block: %i
    Max treads dim: (%i, %i, %i)
    Max grid size: (%i, %i, %i)
    Total const mem: %i
    Compute capability: %i.%i
    Clock Rate (GHz): %f
    Texture alignment: %i
""" % (self.name, self.totalGlobalMem, self.sharedMemPerBlock,
       self.regsPerBlock, self.warpSize, self.memPitch,
       self.maxThreadsPerBlock,
       self.maxThreadsDim[0], self.maxThreadsDim[1], self.maxThreadsDim[2],
       self.maxGridSize[0], self.maxGridSize[1], self.maxGridSize[2],
       self.totalConstMem, self.major, self.minor,
       float(self.clockRate)/1.0e6, self.textureAlignment)

def cudaGetDeviceCount():
    if _libcudart is None: return  0
    deviceCount = ctypes.c_int()
    status = _libcudart.cudaGetDeviceCount(ctypes.byref(deviceCount))
    _checkCudaStatus(status)
    return deviceCount.value

def getDeviceProperties(device):
    if _libcudart is None: return  None
    _checkDeviceNumber(device)
    props = DeviceProp()
    status = _libcudart.cudaGetDeviceProperties(ctypes.byref(props), device)
    _checkCudaStatus(status)
    return props

def getDriverVersion():
    if _libcudart is None: return  None
    version = ctypes.c_int()
    _libcudart.cudaDriverGetVersion(ctypes.byref(version))
    v = "%d.%d" % (version.value//1000,
                   version.value%100)
    return v

def getRuntimeVersion():
    if _libcudart is None: return  None
    version = ctypes.c_int()
    _libcudart.cudaRuntimeGetVersion(ctypes.byref(version))
    v = "%d.%d" % (version.value//1000,
                   version.value%100)
    return v

def getGpuCount():
    count=0
    for ii in range(cudaGetDeviceCount()):
        props = getDeviceProperties(ii)
        if props.major!=9999: count+=1
    return count

def getLoadError():
    return _libcudart_error


version = getDriverVersion()
if version is not None and not version.startswith('2.3'):
    sys.stdout.write("WARNING: Driver version %s may not work with %s\n" %
                     (version, sys.argv[0]))

version = getRuntimeVersion()
if version is not None and not version.startswith('2.3'):
    sys.stdout.write("WARNING: Runtime version %s may not work with %s\n" %
                     (version, sys.argv[0]))


def main():

    sys.stdout.write("Driver version: %s\n" % getDriverVersion())
    sys.stdout.write("Runtime version: %s\n" % getRuntimeVersion())

    nn = cudaGetDeviceCount()
    sys.stdout.write("Device count: %s\n" % nn)

    for ii in range(nn):
        props = getDeviceProperties(ii)
        sys.stdout.write("\nDevice %d:\n" % ii)
        #sys.stdout.write("%s" % props)
        for f_name, f_type in props._fields_:
            attr = props.__getattribute__(f_name)
            sys.stdout.write( "  %s: %s\n" % (f_name, attr))

    gpuCount = getGpuCount()
    if gpuCount > 0:
        sys.stdout.write("\n")
    sys.stdout.write("GPU count: %d\n" % getGpuCount())
    e = getLoadError()
    if e is not None:
        sys.stdout.write("There was an error loading a library:\n%s\n\n" % e)

if __name__=="__main__":
    main()

score 4 · Accepted Answer

cudaが見つかった場合は、小さなGPUクエリプログラムをコンパイルできます。これがあなたがニーズを採用できる簡単なものです：

#include <stdlib.h>
#include <stdio.h>
#include <cuda.h>
#include <cuda_runtime.h>

int main(int argc, char** argv) {
  int ct,dev;
  cudaError_t code;
  struct cudaDeviceProp prop;

 cudaGetDeviceCount(&ct);
 code = cudaGetLastError();
 if(code)  printf("%s\n", cudaGetErrorString(code));


if(ct == 0) {
   printf("Cuda device not found.\n");
   exit(0);
}
 printf("Found %i Cuda device(s).\n",ct);

for (dev = 0; dev < ct; ++dev) {
printf("Cuda device %i\n", dev);

cudaGetDeviceProperties(&prop,dev);
printf("\tname : %s\n", prop.name);
 printf("\ttotalGlobablMem: %lu\n", (unsigned long)prop.totalGlobalMem);
printf("\tsharedMemPerBlock: %i\n", prop.sharedMemPerBlock);
printf("\tregsPerBlock: %i\n", prop.regsPerBlock);
printf("\twarpSize: %i\n", prop.warpSize);
printf("\tmemPitch: %i\n", prop.memPitch);
printf("\tmaxThreadsPerBlock: %i\n", prop.maxThreadsPerBlock);
printf("\tmaxThreadsDim: %i, %i, %i\n", prop.maxThreadsDim[0], prop.maxThreadsDim[1], prop.maxThreadsDim[2]);
printf("\tmaxGridSize: %i, %i, %i\n", prop.maxGridSize[0], prop.maxGridSize[1], prop.maxGridSize[2]);
printf("\tclockRate: %i\n", prop.clockRate);
printf("\ttotalConstMem: %i\n", prop.totalConstMem);
printf("\tmajor: %i\n", prop.major);
printf("\tminor: %i\n", prop.minor);
printf("\ttextureAlignment: %i\n", prop.textureAlignment);
printf("\tdeviceOverlap: %i\n", prop.deviceOverlap);
printf("\tmultiProcessorCount: %i\n", prop.multiProcessorCount);
}
}

score 1 · Accepted Answer

有用なアプローチの1つは、nvidia-smiなどのCUDAがインストールしたプログラムを実行して、それらが何を返すかを確認することです。

        find_program(_nvidia_smi "nvidia-smi")
        if (_nvidia_smi)
            set(DETECT_GPU_COUNT_NVIDIA_SMI 0)
            # execute nvidia-smi -L to get a short list of GPUs available
            exec_program(${_nvidia_smi_path} ARGS -L
                OUTPUT_VARIABLE _nvidia_smi_out
                RETURN_VALUE    _nvidia_smi_ret)
            # process the stdout of nvidia-smi
            if (_nvidia_smi_ret EQUAL 0)
                # convert string with newlines to list of strings
                string(REGEX REPLACE "\n" ";" _nvidia_smi_out "${_nvidia_smi_out}")
                foreach(_line ${_nvidia_smi_out})
                    if (_line MATCHES "^GPU [0-9]+:")
                        math(EXPR DETECT_GPU_COUNT_NVIDIA_SMI "${DETECT_GPU_COUNT_NVIDIA_SMI}+1")
                        # the UUID is not very useful for the user, remove it
                        string(REGEX REPLACE " \\(UUID:.*\\)" "" _gpu_info "${_line}")
                        if (NOT _gpu_info STREQUAL "")
                            list(APPEND DETECT_GPU_INFO "${_gpu_info}")
                        endif()
                    endif()
                endforeach()

                check_num_gpu_info(${DETECT_GPU_COUNT_NVIDIA_SMI} DETECT_GPU_INFO)
                set(DETECT_GPU_COUNT ${DETECT_GPU_COUNT_NVIDIA_SMI})
            endif()
        endif()

linux/procまたはlspciを照会することもできます。https://github.com/gromacs/gromacs/blob/master/cmake/gmxDetectGpu.cmakeで完全に機能するCMakeの例を参照してください

build-automation - cmakeからcuda対応GPUの存在をテストする最も簡単な方法は？

5 に答える 5

Related

Reference