python - この TensorFlow の実装が、Matlab の NN よりもはるかに成功していないのはなぜですか?

Question

f(x) = 1/xおもちゃの例として、 100 のノイズのないデータポイントから関数を当てはめようとしています。matlab の既定の実装は、平均二乗差 ~10^-10 で驚異的に成功し、完全に補間されます。

10 個のシグモイドニューロンの 1 つの隠れ層を持つニューラルネットワークを実装します。私はニューラルネットワークの初心者なので、愚かなコードに気をつけてください。

import tensorflow as tf
import numpy as np

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

#Can't make tensorflow consume ordinary lists unless they're parsed to ndarray
def toNd(lst):
    lgt = len(lst)
    x = np.zeros((1, lgt), dtype='float32')
    for i in range(0, lgt):
        x[0,i] = lst[i]
    return x

xBasic = np.linspace(0.2, 0.8, 101)
xTrain = toNd(xBasic)
yTrain = toNd(map(lambda x: 1/x, xBasic))

x = tf.placeholder("float", [1,None])
hiddenDim = 10

b = bias_variable([hiddenDim,1])
W = weight_variable([hiddenDim, 1])

b2 = bias_variable([1])
W2 = weight_variable([1, hiddenDim])

hidden = tf.nn.sigmoid(tf.matmul(W, x) + b)
y = tf.matmul(W2, hidden) + b2

# Minimize the squared errors.
loss = tf.reduce_mean(tf.square(y - yTrain))
optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(loss)

# For initializing the variables.
init = tf.initialize_all_variables()

# Launch the graph
sess = tf.Session()
sess.run(init)

for step in xrange(0, 4001):
    train.run({x: xTrain}, sess)
    if step % 500 == 0:
        print loss.eval({x: xTrain}, sess)

平均二乗差は ~2*10^-3 で終わるため、matlab よりも約 7 桁悪くなります。視覚化

xTest = np.linspace(0.2, 0.8, 1001)
yTest = y.eval({x:toNd(xTest)}, sess)  
import matplotlib.pyplot as plt
plt.plot(xTest,yTest.transpose().tolist())
plt.plot(xTest,map(lambda x: 1/x, xTest))
plt.show()

適合が体系的に不完全であることがわかります: matlab のものは肉眼では完全に見えますが、違いは均一に < 10^-5 です: TensorFlow を使用して、Matlab ネットワークの図を複製しようとしました:

ちなみに、この図はシグモイド活性化関数ではなく tanh を暗示しているようです。確かにドキュメントのどこにも見つかりません。しかし、TensorFlow で tanh ニューロンを使用しようとするとnan、変数のフィッティングがすぐに失敗します。何故かはわからない。

Matlab は、Levenberg–Marquardt トレーニングアルゴリズムを使用します。ベイジアン正則化は、10^-12 の平均二乗でさらに成功します (おそらく、浮動小数点演算の蒸気の領域にいます)。

TensorFlow の実装がこれほど悪いのはなぜですか? 改善するにはどうすればよいですか?

score 16 · Accepted Answer

ところで、これは上記のわずかにクリーンアップされたバージョンで、形状の問題の一部と tf と np の間の不要なバウンスをクリーンアップします。40k ステップ後に 3e-08、または 4000 ステップ後に約 1.5e-5 を達成します。

import tensorflow as tf
import numpy as np

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

xTrain = np.linspace(0.2, 0.8, 101).reshape([1, -1])
yTrain = (1/xTrain)

x = tf.placeholder(tf.float32, [1,None])
hiddenDim = 10

b = bias_variable([hiddenDim,1])
W = weight_variable([hiddenDim, 1])

b2 = bias_variable([1])
W2 = weight_variable([1, hiddenDim])

hidden = tf.nn.sigmoid(tf.matmul(W, x) + b)
y = tf.matmul(W2, hidden) + b2

# Minimize the squared errors.                                                                
loss = tf.reduce_mean(tf.square(y - yTrain))
step = tf.Variable(0, trainable=False)
rate = tf.train.exponential_decay(0.15, step, 1, 0.9999)
optimizer = tf.train.AdamOptimizer(rate)
train = optimizer.minimize(loss, global_step=step)
init = tf.initialize_all_variables()

# Launch the graph                                                                            
sess = tf.Session()
sess.run(init)

for step in xrange(0, 40001):
    train.run({x: xTrain}, sess)
    if step % 500 == 0:
        print loss.eval({x: xTrain}, sess)

とはいえ、LMA が 2D 曲線のフィッティングに関して、より一般的な DNN スタイルのオプティマイザーよりも優れていることは、おそらくそれほど驚くべきことではありません。Adam とその他のチームは、非常に高次元の問題をターゲットにしており、非常に大規模なネットワークでは LMA が非常に遅くなり始めています(12-15 を参照)。

python - この TensorFlow の実装が、Matlab の NN よりもはるかに成功していないのはなぜですか?

2 に答える 2

Related

Reference