r - 異常検出に h2o を使用する MSE

Question

私はECG異常検出のためにh2oによって与えられた例を使用していました. MSE を手動で計算しようとすると、異なる結果が得られました。違いを示すために、最後のテストケースを使用しましたが、23 のケースはすべて異なります。完全なコードが添付されています。

ありがとう、エリ。

suppressMessages(library(h2o))
localH2O = h2o.init(max_mem_size = '6g', # use 6GB of RAM of *GB available
                nthreads = -1) # use all CPUs (8 on my personal computer :3)

# Download and import ECG train and test data into the H2O cluster
train_ecg <- h2o.importFile(path = "http://h2o-public-test-data.s3.amazonaws.com/smalldata/anomaly/ecg_discord_train.csv",
                          header = FALSE,
                          sep = ",")
test_ecg <- h2o.importFile(path = "http://h2o-public-test-data.s3.amazonaws.com/smalldata/anomaly/ecg_discord_test.csv",
                         header = FALSE,
                         sep = ",")
# Train deep autoencoder learning model on "normal"
# training data, y ignored
anomaly_model <- h2o.deeplearning(x = names(train_ecg),
                                 training_frame = train_ecg,
                                 activation = "Tanh",
                                 autoencoder = TRUE,
                                 hidden = c(50,20,50),
                                 l1 = 1e-4,
                                 epochs = 100)

# Compute reconstruction error with the Anomaly
# detection app (MSE between output layer and input layer)
recon_error <- h2o.anomaly(anomaly_model, test_ecg)

# Pull reconstruction error data into R and
# plot to find outliers (last 3 heartbeats)
recon_error <- as.data.frame(recon_error)
recon_error
plot.ts(recon_error)
test_recon <- h2o.predict(anomaly_model, test_ecg)

t <- as.vector(test_ecg[23,])
r <- as.vector(test_recon[23,])
mse.23 <- sum((t-r)^2)/length(t)
mse.23
recon_error[23,]

> mse.23
[1] 2.607374
> recon_error[23,]
[1] 8.264768

score 0 · Accepted Answer

H2O のオートエンコーダーの場合、数値スケーリングの問題を回避するために、MSE の計算は正規化された空間で行われます。たとえば、カテゴリ機能または非常に大きな数値がある場合、ニューラルネットワークオートエンコーダーはこれらの数値を直接操作できませんが、代わりに、最初にダミーのワンホットエンコーディングと数値機能の正規化を実行してから、fwd/back を実行します。再構成エラーの伝播と計算 (正規化および拡張された空間で)。最初に純粋な数値データの範囲 (最大 - 最小) で各列を手動で分割すると、結果が一致するはずです。

これは、このチェックを明示的に行う JUnit です (まさにそのデータセットに対して): https://github.com/h2oai/h2o-3/blob/master/h2o-algos/src/test/java/hex/deeplearning/DeepLearningAutoEncoderTest. java#L86-L104

詳細については、 https://0xdata.atlassian.net/browse/PUBDEV-2078も参照してください。

r - 異常検出に h2o を使用する MSE

2 に答える 2

Related

Reference