tensorflow - TensorFlow-Slim マルチ GPU トレーニング

Question

TensorFlow-Slim を使用しています。私の目的は、特定の標準スクリプト (/models/slim/scripts にあります) をマルチ GPU モードで実行することです。finetune_resnet_v1_50_on_flowers.sh スクリプトをテストしました (12.04.2017 で複製)。トレーニング部分の最後に --num_clones=2 を追加しました (/slim/deployment/model_deploy_test.py と以前の StackOverflow の回答に触発されました):

python train_image_classifier.py \
  --train_dir=${TRAIN_DIR} \
  --dataset_name=flowers \
  --dataset_split_name=train \
  --dataset_dir=${DATASET_DIR} \
  --model_name=resnet_v1_50 \
  --checkpoint_path=${PRETRAINED_CHECKPOINT_DIR}/resnet_v1_50.ckpt \
  --checkpoint_exclude_scopes=resnet_v1_50/logits \
  --trainable_scopes=resnet_v1_50/logits \
  --max_number_of_steps=3000 \
  --batch_size=32 \
  --learning_rate=0.01 \
  --save_interval_secs=60 \
  --save_summaries_secs=60 \
  --log_every_n_steps=100 \
  --optimizer=rmsprop \
  --weight_decay=0.00004 \
  --num_clones=2

deployment/model_deploy_test.py からのコード:

def testMultiGPU(self):
    deploy_config = model_deploy.DeploymentConfig(num_clones=2)

警告が 1 つあります (「デバイスの仕様を無視しています」):

I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla P100-SXM2-16GB, pci bus id: 0000:85:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla P100-SXM2-16GB, pci bus id: 0000:86:00.0)
I tensorflow/core/common_runtime/simple_placer.cc:669] Ignoring device specification /GPU:1 for node 'clone_1/fifo_queue_Dequeue' because the input edge from 'prefetch_queue/fifo_queue' is a reference connection and already has a device field set to /CPU:0
I tensorflow/core/common_runtime/simple_placer.cc:669] Ignoring device specification /GPU:0 for node 'clone_0/fifo_queue_Dequeue' because the input edge from 'prefetch_queue/fifo_queue' is a reference connection and already has a device field set to /CPU:0

GPU は正常に実行されますが (メモリ使用量と GPU 使用率)、トレーニングは単一の GPU トレーニングと比較して高速ではありません。

この問題は、 https ://github.com/tensorflow/tensorflow/issues/8061 に関連している可能性があります。

この問題に対するあなたの回答、意見、または具体的な提案をいただければ幸いです。

CUDA バージョン: リリース 8.0、V8.0.53

バイナリからインストールされた TensorFlow のテスト済みバージョン: 1.0.1 および 1.1.0rc

GPU: NVIDIA テスラ P100 (SXM2)

tensorflow - TensorFlow-Slim マルチ GPU トレーニング

2 に答える 2

Related

Reference