java - Theta 更新ルールの勾配出力を計算する

Question

これは 0/1 活性化関数の代わりにシグモイド関数を使用しているため、これが勾配降下を計算する正しい方法だと思います。そうですか?

  static double calculateOutput( int theta, double weights[], double[][] feature_matrix, int file_index, int globo_dict_size )
  {
     //double sum = x * weights[0] + y * weights[1] + z * weights[2] + weights[3];
     double sum = 0.0;

     for (int i = 0; i < globo_dict_size; i++) 
     {
         sum += ( weights[i] * feature_matrix[file_index][i] );
     }
     //bias
     sum += weights[ globo_dict_size ];

     return sigmoid(sum);
  }

  private static double sigmoid(double x)
  {
      return 1 / (1 + Math.exp(-x));
  }

この次のコードでは、Θ 値を更新しようとしています (パーセプトロンの重みに相当しますよね?) 。関連する質問LEARNING_RATE * localError * feature_matrix__train[p][i] * output_gradient[i]で、その目的のためにこの式が与えられました。パーセプトロンからの重みの更新をコメントアウトしました。

この新しい更新ルールは正しいアプローチですか?

output_gradient とはどういう意味ですか? calculateOutputそれは私の方法で計算した合計と同等ですか?

      //LEARNING WEIGHTS
      double localError, globalError;
      int p, iteration, output;

      iteration = 0;
      do 
      {
          iteration++;
          globalError = 0;
          //loop through all instances (complete one epoch)
          for (p = 0; p < number_of_files__train; p++) 
          {
              // calculate predicted class
              output = calculateOutput( theta, weights, feature_matrix__train, p, globo_dict_size );
              // difference between predicted and actual class values
              localError = outputs__train[p] - output;
              //update weights and bias
              for (int i = 0; i < globo_dict_size; i++) 
              {
                  //weights[i] += ( LEARNING_RATE * localError * feature_matrix__train[p][i] );

                  weights[i] += LEARNING_RATE * localError * feature_matrix__train[p][i] * output_gradient[i]

              }
              weights[ globo_dict_size ] += ( LEARNING_RATE * localError );

              //summation of squared error (error value for all instances)
              globalError += (localError*localError);
          }

          /* Root Mean Squared Error */
          if (iteration < 10) 
              System.out.println("Iteration 0" + iteration + " : RMSE = " + Math.sqrt( globalError/number_of_files__train ) );
          else
              System.out.println("Iteration " + iteration + " : RMSE = " + Math.sqrt( globalError/number_of_files__train ) );
          //System.out.println( Arrays.toString( weights ) );
      } 
      while(globalError != 0 && iteration<=MAX_ITER);

更新更新しましたが、次のようになります。

  double loss, cost, hypothesis, gradient;
  int p, iteration;

  iteration = 0;
  do 
  {
    iteration++;
    cost = 0.0;
    loss = 0.0;

    //loop through all instances (complete one epoch)
    for (p = 0; p < number_of_files__train; p++) 
    {

      // 1. Calculate the hypothesis h = X * theta
      hypothesis = calculateHypothesis( theta, feature_matrix__train, p, globo_dict_size );

      // 2. Calculate the loss = h - y and maybe the squared cost (loss^2)/2m
      loss = hypothesis - outputs__train[p];

      // 3. Calculate the gradient = X' * loss / m
      gradient = calculateGradent( theta, feature_matrix__train, p, globo_dict_size, loss );

      // 4. Update the parameters theta = theta - alpha * gradient
      for (int i = 0; i < globo_dict_size; i++) 
      {
          theta[i] = theta[i] - (LEARNING_RATE * gradient);
      }

    }

    //summation of squared error (error value for all instances)
    cost += (loss*loss);


  /* Root Mean Squared Error */
  if (iteration < 10) 
      System.out.println("Iteration 0" + iteration + " : RMSE = " + Math.sqrt( cost/number_of_files__train ) );
  else
      System.out.println("Iteration " + iteration + " : RMSE = " + Math.sqrt( cost/number_of_files__train ) );
  //System.out.println( Arrays.toString( weights ) );

  } 
  while(cost != 0 && iteration<=MAX_ITER);


}

static double calculateHypothesis( double theta[], double[][] feature_matrix, int file_index, int globo_dict_size )
{
    double hypothesis = 0.0;

     for (int i = 0; i < globo_dict_size; i++) 
     {
         hypothesis += ( theta[i] * feature_matrix[file_index][i] );
     }
     //bias
     hypothesis += theta[ globo_dict_size ];

     return hypothesis;
}

static double calculateGradent( double theta[], double[][] feature_matrix, int file_index, int globo_dict_size, double loss )
{
    double gradient = 0.0;

     for (int i = 0; i < globo_dict_size; i++) 
     {
         gradient += ( feature_matrix[file_index][i] * loss);
     }

     return gradient;
}

public static double hingeLoss()
{
    // l(y, f(x)) = max(0, 1 − y · f(x))

    return HINGE;
}

score 1 · Accepted Answer

あなたのcalculateOutput方法は正しいようです。あなたの次のコードは、私が本当にそうは思わない:

weights[i] += LEARNING_RATE * localError * feature_matrix__train[p][i] * output_gradient[i]

他の質問に投稿した画像を見てください。

Theta のルールを更新する

コード内のこれらのルールの各部分を特定してみましょう。

Theta0 andweights[i]Theta1:コードのように見えます。願っていglobo_dict_size = 2ます。
alpha: あなたのようLEARNING_RATEです;
1 / m: 更新ルールのどこにもこれが見つかりません。mAndrew Ng のビデオのトレーニングインスタンスの数です。あなたの場合、それは1 / number_of_files__train私が思うに違いありません。それほど重要ではありませんが、それがなくても問題なく動作するはずです。
合計:calculateOutput関数を使用してこれを行い、その結果をlocalError変数で使用し、それを乗算します( Andrew Ng の表記にfeature_matrix__train[p][i]相当)。x(i)

この部分は偏導関数であり、勾配の一部です!

なんで？[h_theta(x(i)) - y(i)]^2に対するの偏導関数は次のようになるためTheta0です。
```
2*[h_theta(x(i)) - y(i)] * derivative[h_theta(x(i)) - y(i)]
derivative[h_theta(x(i)) - y(i)] =
derivative[Theta0 * x(i, 1) + Theta1*x(i, 2) - y(i)] =
x(i, 1)
```
もちろん、全体の合計を導出する必要があります。これは、Andrew Ng1 / (2m)がコスト関数に使用した理由でもあり、導出から得られる2と相殺されます。2

x(i, 1)、または単にx(1)すべてのもので構成する必要があることを覚えておいてください。コードでは、次のことを確認する必要があります。
```
feature_matrix__train[p][0] == 1
```
それでおしまい！output_gradient[i]あなたのコードに何が入っているのかわかりません。どこにも定義していません。

このチュートリアルを見て、使用したアルゴリズムをよりよく理解することをお勧めします。シグモイド関数を使っているので分類したいようですが、別のコスト関数を使ったほうがいいです。そのドキュメントは、ロジスティック回帰も扱っています。

java - Theta 更新ルールの勾配出力を計算する

1 に答える 1

Related

Reference