deep-learning - Caffe HDF5 ピクセル単位の分類

Question

caffe を使用して、画像のピクセル単位のバイナリ分類を実装しようとしています。寸法が 3x256x256 の画像ごとに、各エントリが 0 または 1 としてマークされている 256x256 ラベル配列があります。また、以下のコードを使用して HDF5 ファイルを読み取ると、

dirname = "examples/hdf5_classification/data"

f = h5py.File(os.path.join(dirname, 'train.h5'), "r")
ks = f.keys()
data = np.array(f[ks[0]])
label = np.array(f[ks[1]])
print "Data dimension from HDF5", np.shape(data)
print "Label dimension from HDF5", np.shape(label)

データとラベルの次元を次のように取得します

Data dimension from HDF5 (402, 3, 256, 256)
Label dimension from HDF5 (402, 256, 256)

このデータを特定の hdf5 分類ネットワークにフィードしようとしていますが、トレーニング中に次の出力が得られます (デフォルトのソルバーを使用しますが、GPU モードで)。

!cd /home/unni/MTPMain/caffe-master/ && ./build/tools/caffe train -solver examples/hdf5_classification/solver.prototxt

与える

I1119 01:29:02.222512 11910 caffe.cpp:184] Using GPUs 0
I1119 01:29:02.509752 11910 solver.cpp:47] Initializing solver from parameters: 
train_net: "examples/hdf5_classification/train_val.prototxt"
test_net: "examples/hdf5_classification/train_val.prototxt"
test_iter: 250
test_interval: 1000
base_lr: 0.01
display: 1000
max_iter: 10000
lr_policy: "step"
gamma: 0.1
momentum: 0.9
weight_decay: 0.0005
stepsize: 5000
snapshot: 10000
snapshot_prefix: "examples/hdf5_classification/data/train"
solver_mode: GPU
device_id: 0
I1119 01:29:02.519805 11910 solver.cpp:80] Creating training net from train_net file: examples/hdf5_classification/train_val.prototxt
I1119 01:29:02.520031 11910 net.cpp:322] The NetState phase (0) differed from the phase (1) specified by a rule in layer data
I1119 01:29:02.520053 11910 net.cpp:322] The NetState phase (0) differed from the phase (1) specified by a rule in layer accuracy
I1119 01:29:02.520104 11910 net.cpp:49] Initializing net from parameters: 
name: "LogisticRegressionNet"
state {
  phase: TRAIN
}
layer {
  name: "data"
  type: "HDF5Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  hdf5_data_param {
    source: "examples/hdf5_classification/data/train.txt"
    batch_size: 10
  }
}
layer {
  name: "fc1"
  type: "InnerProduct"
  bottom: "data"
  top: "fc1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 2
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "fc1"
  bottom: "label"
  top: "loss"
}
I1119 01:29:02.520256 11910 layer_factory.hpp:76] Creating layer data
I1119 01:29:02.520277 11910 net.cpp:106] Creating Layer data
I1119 01:29:02.520290 11910 net.cpp:411] data -> data
I1119 01:29:02.520331 11910 net.cpp:411] data -> label
I1119 01:29:02.520352 11910 hdf5_data_layer.cpp:80] Loading list of HDF5 filenames from: examples/hdf5_classification/data/train.txt
I1119 01:29:02.529341 11910 hdf5_data_layer.cpp:94] Number of HDF5 files: 1
I1119 01:29:02.542645 11910 hdf5.cpp:32] Datatype class: H5T_FLOAT
I1119 01:29:10.601307 11910 net.cpp:150] Setting up data
I1119 01:29:10.612926 11910 net.cpp:157] Top shape: 10 3 256 256 (1966080)
I1119 01:29:10.612963 11910 net.cpp:157] Top shape: 10 256 256 (655360)
I1119 01:29:10.612969 11910 net.cpp:165] Memory required for data: 10485760
I1119 01:29:10.612983 11910 layer_factory.hpp:76] Creating layer fc1
I1119 01:29:10.624948 11910 net.cpp:106] Creating Layer fc1
I1119 01:29:10.625015 11910 net.cpp:454] fc1 <- data
I1119 01:29:10.625039 11910 net.cpp:411] fc1 -> fc1
I1119 01:29:10.645814 11910 net.cpp:150] Setting up fc1
I1119 01:29:10.645864 11910 net.cpp:157] Top shape: 10 2 (20)
I1119 01:29:10.645875 11910 net.cpp:165] Memory required for data: 10485840
I1119 01:29:10.645912 11910 layer_factory.hpp:76] Creating layer loss
I1119 01:29:10.657094 11910 net.cpp:106] Creating Layer loss
I1119 01:29:10.657133 11910 net.cpp:454] loss <- fc1
I1119 01:29:10.657147 11910 net.cpp:454] loss <- label
I1119 01:29:10.657163 11910 net.cpp:411] loss -> loss
I1119 01:29:10.657189 11910 layer_factory.hpp:76] Creating layer loss
F1119 01:29:14.883095 11910 softmax_loss_layer.cpp:42] Check failed: outer_num_ * inner_num_ == bottom[1]->count() (10 vs. 655360) Number of labels must match number of predictions; e.g., if softmax axis == 1 and prediction shape is (N, C, H, W), label count (number of labels) must be N*H*W, with integer values in {0, 1, ..., C-1}.
*** Check failure stack trace: ***
    @     0x7f0652e1adaa  (unknown)
    @     0x7f0652e1ace4  (unknown)
    @     0x7f0652e1a6e6  (unknown)
    @     0x7f0652e1d687  (unknown)
    @     0x7f0653494219  caffe::SoftmaxWithLossLayer<>::Reshape()
    @     0x7f065353f50f  caffe::Net<>::Init()
    @     0x7f0653541f05  caffe::Net<>::Net()
    @     0x7f06535776cf  caffe::Solver<>::InitTrainNet()
    @     0x7f0653577beb  caffe::Solver<>::Init()
    @     0x7f0653578007  caffe::Solver<>::Solver()
    @     0x7f06535278b3  caffe::Creator_SGDSolver<>()
    @           0x410831  caffe::SolverRegistry<>::CreateSolver()
    @           0x40a16b  train()
    @           0x406908  main
    @     0x7f065232cec5  (unknown)
    @           0x406e28  (unknown)
    @              (nil)  (unknown)
Aborted

基本的にエラーは

softmax_loss_layer.cpp:42] Check failed: 
outer_num_ * inner_num_ == bottom[1]->count() (10 vs. 655360) 
Number of labels must match number of predictions; 
e.g., if softmax axis == 1 and prediction shape is (N, C, H, W), 
label count (number of labels) must be N*H*W, 
with integer values in {0, 1, ..., C-1}.

予想されるラベルの数がバッチサイズとちょうど同じである理由を理解できません。この問題にどのように取り組むべきですか？これは私のラベル付け方法に問題がありますか?

score 2 · Accepted Answer

あなたの問題は、"SoftmaxWithLoss"レイヤーが入力画像ごとに2要素の予測ベクトルを画像ごとに256 x 256のサイズのラベルと比較しようとすることです。
これは意味がありません。

エラーの根本的な原因:画像の各ピクセルにバイナリ分類子を適用するのにうんざりしていると思います。そのために、「fc1」をで"InnerProduct"レイヤーとして定義しましnum_output=2た。ただし、caffe がこれを認識する方法は、画像全体に単一のバイナリ分類子が適用されていることです。したがって、caffe は、画像全体に対して単一のバイナリ予測を提供します。

解決方法:ピクセル単位の予測を行う場合、レイヤーを使用する必要がなくなり"InnerProduct"、「完全な畳み込みネット」が得られます。"fc1" を conv レイヤー (たとえば、各ピクセルの 5 x 5 環境を調べて、このパッチに従って決定を行うカーネル) に置き換える場合:

layer {
  name: "bin_class"
  type: "Convolution"
  bottom: "data"
  top: "bin_class"
  convolution_param {
    num_output: 2 # binary class output
    kernel_size: 5 # 5-by-5 patch for prediciton
    pad: 2 # make sure spatial output size equals size of label 
  }
}

今適用"SoftmaxWithLoss"するbottom: bin_classとbottom: label動作するはずです。

deep-learning - Caffe HDF5 ピクセル単位の分類

1 に答える 1

Related

Reference