machine-learning - 確率的勾配降下法において、仮説を更新するこれら 2 つの方法の違いは何ですか?

Question

確率的 GD 中のシータの更新について質問があります。theta を更新するには 2 つの方法があります。

1) 前のシータを使用して、すべてのサンプルのすべての仮説を取得し、サンプルごとにシータを更新します。お気に入り：

hypothese = np.dot(X, theta)
for i in range(0, m):
    theta = theta + alpha * (y[i] - hypothese[i]) * X[i]

2) 別の方法: サンプルのスキャン中に、最新のシータを使用して仮説 [i] を更新します。お気に入り：

for i in range(0, m):
    h = np.dot(X[i], theta)
    theta = theta + alpha * (y[i] - h) * X[i]

SGD コードを確認したところ、2 番目の方法が正しいようです。しかし、私のコーディングでは、最初の方が収束が速く、結果は 2 番目よりも優れています。間違った方法が正しい方法よりも優れたパフォーマンスを発揮するのはなぜですか?

また、完成したコードを次のように添付しました。

def SGD_method1():
maxIter = 100 # max iterations
alpha = 1e4 # learning rate
m, n = np.shape(X)  # X[m,n], m:#samples, n:#features
theta = np.zeros(n) # initial theta
for iter in range(0, maxIter):
    hypothese = np.dot(X, theta)  # update all the hypoes using the same theta
    for i in range(0, m):
        theta = theta + alpha * (y[i] - hypothese[i]) * X[i]
return theta

def SGD_method2():
maxIter = 100 # max iterations
alpha = 1e4 # learning rate
m, n = np.shape(X)  # X[m,n], m:#samples, n:#features
theta = np.zeros(n) # initial theta
for iter in range(0, maxIter):
    for i in range(0, m):
        h = np.dot(X[i], theta)  #  update on hypo using the latest theta
        theta = theta + alpha * (y[i] -h) * X[i]
return theta

machine-learning - 確率的勾配降下法において、仮説を更新するこれら 2 つの方法の違いは何ですか?

1 に答える 1

Related

Reference