tensorflow - テキスト生成に LSTM を使用する TensorFlow

Question

tensorflow を使用してテキストを生成したいと考えており、これを行うために LSTM チュートリアル ( https://www.tensorflow.org/versions/master/tutorials/recurrent/index.html#recurrent-neural-networks ) コードを変更しています。しかし、私の最初の解決策はナンセンスを生成するようで、長時間トレーニングした後でも改善されません。理由がわかりません。アイデアは、ゼロ行列から始めて、一度に 1 つの単語を生成することです。

これは、 https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/models/rnn/ptb/ptb_word_lm.py以下の 2 つの関数を追加したコードです。

ジェネレーターは次のようになります

def generate_text(session,m,eval_op):

    state = m.initial_state.eval()

    x = np.zeros((m.batch_size,m.num_steps), dtype=np.int32)

    output = str()
    for i in xrange(m.batch_size):
        for step in xrange(m.num_steps):
            try:
                # Run the batch 
                # targets have to bee set but m is the validation model, thus it should not train the neural network
                cost, state, _, probabilities = session.run([m.cost, m.final_state, eval_op, m.probabilities],
                                                            {m.input_data: x, m.targets: x, m.initial_state: state})

                # Sample a word-id and add it to the matrix and output
                word_id = sample(probabilities[0,:])
                output = output + " " + reader.word_from_id(word_id)
                x[i][step] = word_id

            except ValueError as e:
                print("ValueError")

    print(output)

変数「確率」を ptb_model に追加しました。これは単にロジットに対するソフトマックスです。

self._probabilities = tf.nn.softmax(logits)

そしてサンプリング：

def sample(a, temperature=1.0):
    # helper function to sample an index from a probability array
    a = np.log(a) / temperature
    a = np.exp(a) / np.sum(np.exp(a))
    return np.argmax(np.random.multinomial(1, a, 1))

score 18 · Accepted Answer

私はまったく同じ目標に向かって取り組んできましたが、それを機能させました。ここには多くの適切な変更がありますが、いくつかの手順を見逃していると思います.

まず、テキストを生成するには、単一のタイムステップのみを表すモデルの別のバージョンを作成する必要があります。その理由は、モデルの次のステップにフィードする前に、各出力 y をサンプリングする必要があるためです。num_stepsこれを行うには、とbatch_size両方を 1に設定する新しい構成を作成しました。

class SmallGenConfig(object):
  """Small config. for generation"""
  init_scale = 0.1
  learning_rate = 1.0
  max_grad_norm = 5
  num_layers = 2
  num_steps = 1 # this is the main difference
  hidden_size = 200
  max_epoch = 4
  max_max_epoch = 13
  keep_prob = 1.0
  lr_decay = 0.5
  batch_size = 1
  vocab_size = 10000

また、次の行を使用してモデルに確率を追加しました。

self._output_probs = tf.nn.softmax(logits)

と

@property
def output_probs(self):
  return self._output_probs

次に、私の機能にはいくつかの違いがありgenerate_text()ます。tf.train.Saver()1 つ目は、オブジェクトを使用してディスクから保存されたモデルパラメータをロードすることです。上記の新しい構成で PTBModel をインスタンス化した後にこれを行うことに注意してください。

def generate_text(train_path, model_path, num_sentences):
  gen_config = SmallGenConfig()

  with tf.Graph().as_default(), tf.Session() as session:
    initializer = tf.random_uniform_initializer(-gen_config.init_scale,
                                                gen_config.init_scale)    
    with tf.variable_scope("model", reuse=None, initializer=initializer):
      m = PTBModel(is_training=False, config=gen_config)

    # Restore variables from disk.
    saver = tf.train.Saver() 
    saver.restore(session, model_path)
    print("Model restored from file " + model_path)

2 つ目の違いは、ID から単語文字列へのルックアップテーブルを取得することです (この関数を作成する必要がありました。以下のコードを参照してください)。

    words = reader.get_vocab(train_path)

私はあなたと同じ方法で初期状態をセットアップしましたが、別の方法で初期トークンをセットアップしました。「文の終わり」トークンを使用して、適切な種類の単語で文を開始したいと考えています。単語インデックスを調べたところ、<eos>たまたまインデックス 2 (決定論的) を持っていることがわかったので、それをハードコーディングしました。

    state = m.initial_state.eval()
    x = 2 # the id for '<eos>' from the training set
    input = np.matrix([[x]])  # a 2D numpy matrix

最後に、文を生成する部分です。とsession.run()を計算するように指示していることに注意してください。そして、入力と状態を与えます。最初の反復では、入力はであり、状態はですが、後続の反復では、最後にサンプリングされた出力を入力として与え、最後の反復から状態を渡します。リストを使用して、出力インデックスから単語文字列を検索することにも注意してください。output_probsfinal_state<eos>initial_statewords

    text = ""
    count = 0
    while count < num_sentences:
      output_probs, state = session.run([m.output_probs, m.final_state],
                                   {m.input_data: input,
                                    m.initial_state: state})
      x = sample(output_probs[0], 0.9)
      if words[x]=="<eos>":
        text += ".\n\n"
        count += 1
      else:
        text += " " + words[x]
      # now feed this new word as input into the next iteration
      input = np.matrix([[x]])

あとは、蓄積したテキストを印刷するだけです。

    print(text)
  return

関数は以上ですgenerate_text()。

get_vocab()最後に、reader.py に記述したの関数定義を示します。

def get_vocab(filename):
  data = _read_words(filename)

  counter = collections.Counter(data)
  count_pairs = sorted(counter.items(), key=lambda x: (-x[1], x[0]))

  words, _ = list(zip(*count_pairs))

  return words

最後に行う必要があるのは、トレーニング後にモデルを保存できるようにすることです。これは次のようになります。

save_path = saver.save(session, "/tmp/model.ckpt")

これは、後でテキストを生成するときにディスクからロードするモデルです。

もう 1 つ問題がありました。Tensorflow のソフトマックス関数によって生成される確率分布の合計が正確に 1.0 にならないことがあることがわかりました。合計が 1.0 より大きい場合np.random.multinomial()、エラーがスローされます。そのため、次のような独自のサンプリング関数を作成する必要がありました

def sample(a, temperature=1.0):
  a = np.log(a) / temperature
  a = np.exp(a) / np.sum(np.exp(a))
  r = random.random() # range: [0,1)
  total = 0.0
  for i in range(len(a)):
    total += a[i]
    if total>r:
      return i
  return len(a)-1

これらすべてをまとめると、小さなモデルは私にいくつかのクールな文章を生成することができました. 幸運を。

score 0 · Accepted Answer

私はあなたのコードを使用していますが、正しくないようです。だから私はそれを少し変更します、それはうまくいくようです。ここに私のコードがありますが、それが正しいかどうかはわかりません:

def generate_text(session,m,eval_op, word_list):
output = []
for i in xrange(20):
    state = m.initial_state.eval()
    x = np.zeros((1,1), dtype=np.int32)
    y = np.zeros((1,1), dtype=np.int32)
    output_str = ""
    for step in xrange(100):
        if True:
            # Run the batch 
            # targets have to bee set but m is the validation model, thus it should not train the neural network
            cost, state, _, probabilities = session.run([m.cost, m.final_state, eval_op, m.probabilities],
                                                        {m.input_data: x, m.targets: y, m.initial_state: state})
            # Sample a word-id and add it to the matrix and output
            word_id = sample(probabilities[0,:])
            if (word_id<0) or (word_id > len(word_list)):
                continue
            #print(word_id)
            output_str = output_str + " " + word_list[word_id]
            x[0][0] = word_id
    print(output_str)
    output.append(output_str)
return output

tensorflow - テキスト生成に LSTM を使用する TensorFlow

2 に答える 2

Related

Reference