opencl - OpenCL - 大きな文字列とメモリパフォーマンス?

Question

私は OpenCL の初心者ですが、オンラインで見つけた多くの例でうまくいきました。私のアプリケーションは非常に基本的なバイオインフォマティクスなので、おそらく 1 ～ 3MB の長さの「ACTCGAAAGGTA....」という非常に長い文字列に関心があります。簡単な計算は、A、T、C、G 要素の数を数えることであり、OpenCL に非常に適しているようです。したがって、ここにある標準的な「2つのベクトルを追加する」例から始めると

__kernel void vector_add_gpu (__global const float* src_a,
                     __global const float* src_b,
                     __global float* res,
               const int num)
{
   const int idx = get_global_id(0);
   if (idx < num)
      res[idx] = src_a[idx] + src_b[idx];
}

私の質問は、ワークグループまたはワークユニット間で元の文字列を分割することについて、何らかの方法で自分自身を考慮する必要があるかということです。私は idx < num のチェックを理解しているので、作業項目が範囲内にあるかどうかを「認識」しているという概念が得られます。これを管理するのは OpenCL の仕事ですか? グローバルデータのチャンクへの分割を明示的に管理する必要がある条件はありますか? おそらく、それらが特定のサイズを超えた場合ですか？（おそらく私のハードウェアの制限によって決定されますか？）

膨大な時間を無駄にする前に、基本的な OpenCL の概念を正しく理解したい :)

TL;DR: 入力データの「大きな」文字列を分割する必要はありますか? それとも、OpenCL がすべての魔法をかけてくれるのでしょうか?

score 2 · Accepted Answer

You can access these arrays just like in pure C code, there's not necessarily any explicit need to partition the input arrays in any way. If you access the array by the global thread id, it's going to be a unique index from the global work pool, which sort of handles the partitioning for you.

The thing to be careful about is making sure that you don't read past the end of your array since you may have to pad the final work group, but you seem to understand that part just fine.

opencl - OpenCL - 大きな文字列とメモリ パフォーマンス?

1 に答える 1

Related

Reference

opencl - OpenCL - 大きな文字列とメモリパフォーマンス?