tensorflow - このドキュメントの softmax_w と softmax_b は何ですか?

Question

私は TensorFlow を初めて使用し、言語モデルをトレーニングする必要がありますが、以下に示すようにドキュメントを読んでいるときにいくつかの問題に遭遇します。

lstm = rnn_cell.BasicLSTMCell(lstm_size)
# Initial state of the LSTM memory.
state = tf.zeros([batch_size, lstm.state_size])

loss = 0.0
for current_batch_of_words in words_in_dataset:
    # The value of state is updated after processing each batch of words.
    output, state = lstm(current_batch_of_words, state)

    # The LSTM output can be used to make next word predictions
    logits = tf.matmul(output, softmax_w) + softmax_b
    probabilities = tf.nn.softmax(logits)
    loss += loss_function(probabilities, target_words)

なぜこの行が必要なのかわかりませんが、

logits = tf.matmul(output, softmax_w) + softmax_b

出力が計算され、target_words が分かれば、損失を直接計算できることがわかったので。疑似コードが追加のレイヤーを追加しているようです。また、前述していない softmax_w と softmax_b とは何ですか。そんな素朴な疑問を投げかけたことで、何か重要なことを見落としているのではないかと思いました。

正しい方向に向けてください。どんな提案も大歓迎です。どうもありがとう。

score 3 · Accepted Answer

そのコードが行っているのは、softmax を計算する前に追加の線形変換を追加することだけです。重みの行列を含むでsoftmax_wなければなりません。バイアスベクトルを含むである必要があります。tf.Variablesoftmax_btf.Variable

詳細については、このチュートリアルのソフトマックスの例をご覧ください: https://www.tensorflow.org/versions/r0.10/tutorials/mnist/beginners/index.html#softmax-regressions

tensorflow - このドキュメントの softmax_w と softmax_b は何ですか?

1 に答える 1

Related

Reference