cuda - CUDAのワープとバンクのメカニズムは何ですか？

Question

私はCUDA並列プログラミングを学ぶ新人です。今、私はデバイスのグローバルメモリアクセスで混乱しています。ワープモデルと合体についてです。

いくつかのポイントがあります：

1つのブロックの糸は縦糸に分かれていると言われています。各ワープには最大32本のスレッドがあります。つまり、同じワープのこれらすべてのスレッドは、同じプロセッサで同時に実行されます。では、ハーフワープの感覚は何ですか？
1つのブロックの共有メモリに関しては、16のバンクに分割されます。バンクの競合を回避するために、複数のスレッドは、同じバンクに書き込むのではなく、同時に1つのバンクを読み取ることができます。これは正しい解釈ですか？

前もって感謝します！

score 5 · Accepted Answer

The principal usage of "half-warp" was applied to CUDA processors prior to the Fermi generation (e.g. the "Tesla" or GT200 generation, and the original G80/G92 generation). These GPUs were architected with a SM (streaming multiprocessor -- a HW block inside the GPU) that had fewer than 32 thread processors. The definition of warp was still the same, but the actual HW execution took place in "half-warps" at a time. Actually the granular details are more complicated than this, but suffice it to say that the execution model caused memory requests to be issued according to the needs of a half-warp, i.e. 16 threads within the warp. A full warp that hit a memory transaction would thus generate a total of 2 requests for that transaction.

Fermi and newer GPUs have at least 32 thread processors per SM. Therefore a memory transaction is immediately visible across a full warp. As a result, memory requests are issued at the per-warp level, rather than per-half-warp. However, a full memory request can only retrieve 128 bytes at a time. Therefore, for data sizes larger than 32 bits per thread per transaction, the memory controller may still break the request down into a half-warp size.

My view is that, especially for a beginner, it's not necessary to have a detailed understanding of half-warp. It's generally sufficient to understand that it refers to a group of 16 threads executing together and it has implications for memory requests.
Shared memory for example on the Fermi-class GPUs is broken into 32 banks. On previous GPUs it was broken into 16 banks. Bank conflicts occur any time an individual bank is accessed by more than one thread in the same memory request (i.e. originating from the same code instruction). To avoid bank conflicts, basic strategies are very similar to the strategies for coalescing memory requests, eg. for global memory. On Fermi and newer GPUs, multiple threads can read the same address without causing a bank conflict, but in general the definition of a bank conflict is when multiple threads read from the same bank. For further understanding of shared memory and how to avoid bank conflicts, I would recommend the NVIDIA webinar on this topic.

cuda - CUDAのワープとバンクのメカニズムは何ですか？

1 に答える 1

Related

Reference