c++ - CUDA でカスタムメモリ管理と Thrust を混在させる

Question

私のプロジェクトでは、カスタムメモリアロケータを実装してcudaMalloc、アプリケーションが「ウォームアップ」した後の不要な呼び出しを回避しました。Thrustさらに、基本的な配列の充填、配列間の算術演算などにカスタムカーネルを使用しており、これらのカーネルを使用して削除することでコードを簡素化したいと考えています。device_vectorデバイス上のすべての配列は、(今のところ) raw ポインターを介して作成およびアクセスされます。これらのオブジェクトに対してメソッドとs メソッドを使用したいと考えてい Thrustますが、raw ポインターとdevice_ptr<>常に変換しているため、コードがやや雑然としています。

私の漠然とした質問: カスタムメモリ管理、Thrust配列メソッド、およびカスタムカーネルの呼び出しの使用法を最も読みやすい方法でどのように整理しますか?

score 10 · Accepted Answer

すべての標準的な C++ コンテナーと同様に、独自の"allocator"thrust::device_vectorを提供することで、がストレージを割り当てる方法をカスタマイズできます。デフォルトでは、のアロケータはであり、Thrust のバックエンドシステムが CUDA の場合、 ( )でストレージを割り当て (割り当てを解除) します。thrust::device_vectorthrust::device_malloc_allocatorcudaMalloccudaFree

device_vectorプログラムの初期化時に実行される単一の大きな割り当て内でストレージをサブ割り当てしたい OP の場合など、がメモリを割り当てる方法をカスタマイズすることが望ましい場合があります。これにより、基盤となる割り当てスキーム (この場合はcudaMalloc.

カスタムアロケータを提供する簡単な方法device_vectorは、から継承することdevice_malloc_allocatorです。原則として、アロケータ全体を最初から作成することもできますが、継承アプローチでは、メンバー関数allocateとdeallocateメンバー関数のみを提供する必要があります。カスタムアロケーターが定義されるdevice_vectorと、2 番目のテンプレートパラメーターとして提供できます。

このコード例は、割り当てと解放時にメッセージを出力するカスタムアロケーターを提供する方法を示しています。

#include <thrust/device_malloc_allocator.h>
#include <thrust/device_vector.h>
#include <iostream>

template<typename T>
  struct my_allocator : thrust::device_malloc_allocator<T>
{
  // shorthand for the name of the base class
  typedef thrust::device_malloc_allocator<T> super_t;

  // get access to some of the base class's typedefs

  // note that because we inherited from device_malloc_allocator,
  // pointer is actually thrust::device_ptr<T>
  typedef typename super_t::pointer   pointer;

  typedef typename super_t::size_type size_type;

  // customize allocate
  pointer allocate(size_type n)
  {
    std::cout << "my_allocator::allocate(): Hello, world!" << std::endl;

    // defer to the base class to allocate storage for n elements of type T
    // in practice, you'd do something more interesting here
    return super_t::allocate(n);
  }

  // customize deallocate
  void deallocate(pointer p, size_type n)
  {
    std::cout << "my_allocator::deallocate(): Hello, world!" << std::endl;

    // defer to the base class to deallocate n elements of type T at address p
    // in practice, you'd do something more interesting here
    super_t::deallocate(p,n);
  }
};

int main()
{
  // create a device_vector which uses my_allocator
  thrust::device_vector<int, my_allocator<int> > vec;

  // create 10 ints
  vec.resize(10, 13);

  return 0;
}

出力は次のとおりです。

$ nvcc my_allocator_test.cu -arch=sm_20 -run
my_allocator::allocate(): Hello, world!
my_allocator::deallocate(): Hello, world!

この例では、昔から聞いていることに注意してmy_allocator::allocate()くださいvec.resize(10,13)。要素を破棄するため、スコープ外になるとmy_allocator::deallocate()1 回呼び出されます。vec

c++ - CUDA でカスタム メモリ管理と Thrust を混在させる

1 に答える 1

Related

Reference

c++ - CUDA でカスタムメモリ管理と Thrust を混在させる