cuda - GPU 機能の発見

Question

GPU のメモリ構成がどのように機能しているかを理解しようとしています。

以下の表にある技術仕様によると、私の GPU は 8 つのアクティブなブロック/SM と 768 のスレッド/SM を持つことができます。その上で、上記の利点を生かすには、各ブロックに 96 (=768/8) のスレッドが必要であると考えていました。この数のスレッドを持つ最も近いブロックは、9x9 ブロック、81 スレッドだと思います。1 つの SM で 8 つのブロックを同時に実行できるという事実を使用すると、648 のスレッドを持つことになります。残りの 120 (= 768-648) はどうでしょうか。

これらの考えに何か問題が起こっていることを私は知っています。SM スレッドの最大数、ブロックあたりのスレッドの最大数、および私の GPU 仕様に基づくワープサイズの間の関係を説明する簡単な例は、非常に役立ちます。

Device 0: "GeForce 9600 GT"
      CUDA Driver Version / Runtime Version          5.5 / 5.0
      CUDA Capability Major/Minor version number:    1.1
      Total amount of global memory:                 512 MBytes (536870912 bytes)
      ( 8) Multiprocessors x (  8) CUDA Cores/MP:    64 CUDA Cores
      GPU Clock rate:                                1680 MHz (1.68 GHz)
      Memory Clock rate:                             700 Mhz
      Memory Bus Width:                              256-bit
      Max Texture Dimension Size (x,y,z)             1D=(8192), 2D=(65536,32768), 3D=(2048,2048,2048)
      Max Layered Texture Size (dim) x layers        1D=(8192) x 512, 2D=(8192,8192) x 512
      Total amount of constant memory:               65536 bytes
      Total amount of shared memory per block:       16384 bytes
      Total number of registers available per block: 8192
      Warp size:                                     32
      Maximum number of threads per multiprocessor:  768
      Maximum number of threads per block:           512
      Maximum sizes of each dimension of a block:    512 x 512 x 64
      Maximum sizes of each dimension of a grid:     65535 x 65535 x 1
      Maximum memory pitch:                          2147483647 bytes
      Texture alignment:                             256 bytes
      Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
      Run time limit on kernels:                     Yes
      Integrated GPU sharing Host Memory:            No
      Support host page-locked memory mapping:       Yes
      Alignment requirement for Surfaces:            Yes
      Device has ECC support:                        Disabled
      Concurrent kernel execution:                   No
      Device supports Unified Addressing (UVA):      No
      Device PCI Bus ID / PCI location ID:           1 / 0

cuda - GPU 機能の発見

1 に答える 1

Related

Reference