私が持っている場合、ある種のマルチ GPU 設定でトレーニングを実行することはまだ可能Peer access not supported between device ordinals
ですか?(GPU が「接続されていない」ことを理解しているため)、たとえば、GPU で各バッチを個別に計算し、CPU でマージすることで、これが方法であることを理解しています。 Caffe バックエンドを使用した DIGITS での「バッチ蓄積」作業。
生の出力:
2017-05-10 15:27:54.360688: I tensorflow/core/common_runtime/gpu/gpu_device.cc:779] Peer access not supported between device ordinals 0 and 1
2017-05-10 15:27:54.360949: I tensorflow/core/common_runtime/gpu/gpu_device.cc:779] Peer access not supported between device ordinals 0 and 2
2017-05-10 15:27:54.361504: I tensorflow/core/common_runtime/gpu/gpu_device.cc:779] Peer access not supported between device ordinals 0 and 3
2017-05-10 15:27:54.361738: I tensorflow/core/common_runtime/gpu/gpu_device.cc:779] Peer access not supported between device ordinals 1 and 0
2017-05-10 15:27:54.361892: I tensorflow/core/common_runtime/gpu/gpu_device.cc:779] Peer access not supported between device ordinals 1 and 2
2017-05-10 15:27:54.362065: I tensorflow/core/common_runtime/gpu/gpu_device.cc:779] Peer access not supported between device ordinals 1 and 3
2017-05-10 15:27:54.362263: I tensorflow/core/common_runtime/gpu/gpu_device.cc:779] Peer access not supported between device ordinals 2 and 0
2017-05-10 15:27:54.362485: I tensorflow/core/common_runtime/gpu/gpu_device.cc:779] Peer access not supported between device ordinals 2 and 1
2017-05-10 15:27:54.362693: I tensorflow/core/common_runtime/gpu/gpu_device.cc:779] Peer access not supported between device ordinals 2 and 3
2017-05-10 15:27:54.362885: I tensorflow/core/common_runtime/gpu/gpu_device.cc:779] Peer access not supported between device ordinals 3 and 0
2017-05-10 15:27:54.362927: I tensorflow/core/common_runtime/gpu/gpu_device.cc:779] Peer access not supported between device ordinals 3 and 1
2017-05-10 15:27:54.362967: I tensorflow/core/common_runtime/gpu/gpu_device.cc:779] Peer access not supported between device ordinals 3 and 2
2017-05-10 15:27:54.364638: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0 1 2 3
2017-05-10 15:27:54.364668: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0: Y N N N
2017-05-10 15:27:54.364687: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 1: N Y N N
2017-05-10 15:27:54.364702: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 2: N N Y N
2017-05-10 15:27:54.364717: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 3: N N N Y