c++ - CUDA の C++11 エイリアステンプレート

Question

重要な質問は、エイリアステンプレートが CUDA コンパイラでサポートされているかどうかです。

gcc-4.8でUbuntuでCUDA 7.5を使用しています。すべてのテンプレートクラスはヘッダーファイルで定義され、#includeコンパイル中に 1 つの翻訳単位に変換されます。

cuda_arrayの周りに薄いラッパーを提供する単純なクラスがありますstd::vector。thrust::host_vectorこれは基本的に、とを組み合わせた非常に単純なバージョンですthrust::device_vector。その宣言は

template <typename T, const size_t N>
class cuda_array {
    std::vector<T> host;
    T *device;
public:
    // lots of type aliases to meet container requirements
    void push() { /* cudaMemcpy(...,H2D); */ }
    void pull() { /* cudaMemcpy(...,D2H); */ }
    // a few others that aren't relevant here
};

マトリックスを作成するために、簡単なテンプレートエイリアスを作成しました。

template <typename T, const size_t M, const size_t N>
using cuda_matrix = cuda_array<T, M * N>;

operator*型の安全性と使いやすさのために、行列とベクトルの乗算 CUDA カーネルをオーバーロードされたカーネルにマップしたいと考えています (正しく呼び出されることを確認するのは呼び出し元に任されています) push。pull

template <typename T, const size_t rows, const size_t cols>
__global__ void matrix_vector_mul(T *A, T *b, T *result) {
     __shared__ T shared_b[cols];
    // rest of it
}

template <typename T, const size_t M, const size_t N>
__host__ cuda_array<T, M> operator*(cuda_matrix<T, M, N> &m, cuda_array<T, N> &v) {
    cuda_array<T, M> result;
    matrix_vector_mul<T, M, N><<<16, 32>>>(m.device_data(), v.device_data(), result.device_data());
    return result;
}

私の「main.cpp」には、

cuda_matrix<int,16,32> A;
cuda_array<int,32> b;
auto result = A * b;

最後の行は、というエラーをスローします

error: no operator "*" matches these operands
        operand types are: cuda_matrix<int, 16UL, 32UL> * cuda_array<int, 32UL>

考えられるテンプレート型推定エラーの通常の容疑者をすべて追跡しましたが、何も機能しませんでした。必死になって、cuda_matrixエイリアステンプレートをテンプレートクラスに変換しました。

template <typename T, const size_t M, const size_t N>
class cuda_matrix : public cuda_array<T, M * N> {};

そしてコンパイルエラーが消える！したがって、CUDA はまだエイリアステンプレートをサポートしていないようです。それとも、私が理解できないばかげたことをしましたか？

score 4 · Accepted Answer

次のことを覚えておく必要があります。

§ 14.5.7 [temp.alias]/p2:

template-idがエイリアステンプレートの特殊化を参照する場合、エイリアステンプレートのtype-idの template-parameters を template-arguments に置き換えることによって取得される関連する型と同等です。[注: エイリアステンプレート名は決して推測されません。—エンドノート]

これは、次の場合は控除が実行されないことを意味します。

template <typename T, const size_t M, const size_t N>
__host__ cuda_array<T, M> operator*(cuda_matrix<T, M, N> &m, cuda_array<T, N> &v)

しかし、次の場合:

template <typename T, const size_t M, const size_t N>
__host__ cuda_array<T, M> operator*(cuda_array<T, M * N> &m, cuda_array<T, N> &v)
//                                  ~~~~~~~~~~~~~~~~~~~^

など：

§ 14.8.2.5 [temp.deduct.type]/p16:

非型テンプレートパラメーターを使用した関数テンプレートの宣言で、非型テンプレートパラメーターが関数パラメーターリストの部分式で使用されている場合、式は上記で指定された非推定コンテキストです。

M推定不可能なコンテキストにあるため、これoperator*は実行可能なオーバーロードとは見なされません。

cuda_array回避策の 1 つとして、代わりにそれ自体の推定値を検証できます。

template <typename T, std::size_t MN, std::size_t N>
auto operator*(const cuda_array<T, MN>& m, const cuda_array<T, N>& v)
    -> typename std::enable_if<(MN/N)*N==MN, cuda_array<T, MN/N>>::type;

または、すでに持っている継承のトリックを使用します。thenMとNは、の別の非型テンプレートパラメータですcuda_matrix。

c++ - CUDA の C++11 エイリアス テンプレート

1 に答える 1

Related

Reference

c++ - CUDA の C++11 エイリアステンプレート