cuda - foreachの推力のインデックスを取得する方法

Question

デバイスベクトルに特定の値を与えるために、それぞれに推力を使用しようとしています。ここにコードがあります

const uint N = 222222; 
struct assign_functor
{
  template <typename Tuple>
  __device__ 
  void operator()(Tuple t)
  {  
    uint x = threadIdx.x + blockIdx.x * blockDim.x;
    uint y = threadIdx.y + blockIdx.y * blockDim.y;
    uint offset = x + y * blockDim.x * gridDim.x; 

    thrust::get<0>(t) = offset; 
  }
};
int main(int argc, char** argv)
{ 

  thrust::device_vector <float> d_float_vec(N);  

  thrust::for_each(
    thrust::make_zip_iterator( 
      thrust::make_tuple(d_float_vec.begin()) 
    ), 
    thrust::make_zip_iterator( 
      thrust::make_tuple(d_float_vec.end())
    ), 
    assign_functor()
  );

  std::cout<<d_float_vec[10]<<" "<<d_float_vec[N-2] 
}

d_float_vec[N-2] の出力は 222220 になるはずです。しかし、それは1036であることが判明しました。私のコードの何が問題なのですか??

私は、thrust::sequence を使用して、ベクトルにシーケンス値を与えることができることを知っています。スラスト foreach 関数の実際のインデックスを取得する方法を知りたいだけです。ありがとう！

score 3 · Accepted Answer

コメントに記載されているように、thrust::for_each内部で機能する方法について多くのことを想定しているため、次のようなおそらく真実ではないため、アプローチが機能する可能性はほとんどありません。

for_eachが単一のスレッドを使用して各入力要素を処理すると暗黙的に想定しています。これはほぼ間違いなく当てはまりません。スラストは、操作中にスレッドごとに複数の要素を処理する可能性が高くなります。
また、N 番目のスレッドが N 番目の配列要素を処理するように、実行が順番に行われると想定しています。そうではない可能性があり、アプリオリに知ることのできない順序で実行される可能性があります
for_each単一のカーネル起動で入力データセット全体を処理すると想定しています

スラストアルゴリズムは、内部操作が定義されていないブラックボックスとして扱われるべきであり、ユーザー定義のファンクターを実装するためにそれらの知識は必要ありません。あなたの例では、ファンクタ内に順次インデックスが必要な場合は、カウントイテレータを渡します。例を書き直す 1 つの方法は次のようになります。

#include "thrust/device_vector.h"
#include "thrust/for_each.h"
#include "thrust/tuple.h"
#include "thrust/iterator/counting_iterator.h"

typedef unsigned int uint;
const uint N = 222222; 
struct assign_functor
{
  template <typename Tuple>
  __device__ 
  void operator()(Tuple t)
  {  
    thrust::get<1>(t) = (float)thrust::get<0>(t);
  }
};

int main(int argc, char** argv)
{ 
  thrust::device_vector <float> d_float_vec(N);  
  thrust::counting_iterator<uint> first(0);
  thrust::counting_iterator<uint> last = first + N;

  thrust::for_each(
    thrust::make_zip_iterator( 
      thrust::make_tuple(first, d_float_vec.begin()) 
    ), 
    thrust::make_zip_iterator( 
      thrust::make_tuple(last, d_float_vec.end())
    ), 
    assign_functor()
  );

  std::cout<<d_float_vec[10]<<" "<<d_float_vec[N-2]<<std::endl; 
}

ここでは、カウントイテレータがデータ配列と共にタプルで渡され、ファンクタが処理しているデータ配列エントリに対応するシーケンシャルインデックスにアクセスできるようにします。

cuda - foreachの推力のインデックスを取得する方法

1 に答える 1

Related

Reference