python - Tensorflow の Char-RNN

Question

シンプルな RNN をテンソルフローで動作させようとしていますが、いくつか問題があります。

私が今やろうとしているのは、セルタイプとして LSTM を使用して RNN のフォワードパスを単純に実行することです。

いくつかのニュース記事をスクレイピングして、それらを RNN にフィードしたいと考えています。すべての記事の連結で構成される文字列を文字に分割し、文字を整数にマップしました。次に、これらの整数をワンホットエンコードしました。

data = [c for c in article]
chars = list(set(data))
idx_chars = {i:ch for i,ch in enumerate(chars)}
chars_idx = {ch:i for i,ch in enumerate(chars)}
int_data = [chars_idx[ch] for ch in data]

# config values
vocab_size = len(chars)
hidden_size = 100
seq_length = 25

# helper function to get one-hot encoding

def onehot(value):
    result = np.zeros(vocab_size)
    result[value] = 1
    return result

def vectorize_input(inputs):
    result = [onehot(x) for x in inputs]
    return result

input = vectorize_input(int_data[:25])

次にテンソルフローコードです。データ内のすべての文字を実行し、フォワードパスごとに 25 文字を使用したいと考えています。私の最初の質問は、バッチサイズに関するものです。先ほど述べた方法でこれを実行したい場合、私のバッチサイズは 1 ですよね? したがって、入力内の 1 つの文字に対応する各ベクトルの形状は [1,vocab_size] であり、入力にはこれらのベクトルが 25 個あります。そこで、次のテンソルを使用しました。

seq_input = tf.placeholder(tf.int32, shape = [seq_length, 1, vocab_size])
targets = tf.placeholder(tf.int32, shape = [seq_length, 1, vocab_size])
inputs = [tf.reshape(i,(1,vocab_size)) for i in tf.split(0,seq_length,seq_input)]

rnn 関数が期待する形式であるため、最後のテンソルを作成する必要がありました。

次に、変数のスコープで問題が発生しました。以下のエラーが表示されます。

cell = rnn_cell.BasicLSTMCell(hidden_size, input_size = vocab_size)
# note: first argument of zero_state is the batch_size
initial_state = cell.zero_state(1, tf.float32)
outputs, state = rnn.rnn(cell, inputs, initial_state= initial_state)
sess = tf.Session()
sess.run([outputs, state], feed_dict = {inputs:input})

ValueError                                Traceback (most recent call last)
<ipython-input-90-449af38c387d> in <module>()
      7     # note: first argument of zero_state is supposed to be batch_size
      8     initial_state = cell.zero_state(1, tf.float32)
----> 9     outputs, state = rnn.rnn(cell, inputs, initial_state= initial_state)
     10 
     11 sess = tf.Session()

/Library/Python/2.7/site-packages/tensorflow/python/ops/rnn.pyc in rnn(cell, inputs, initial_state, dtype, sequence_length, scope)
    124             zero_output, state, call_cell)
    125       else:
--> 126         (output, state) = call_cell()
    127 
    128       outputs.append(output)

/Library/Python/2.7/site-packages/tensorflow/python/ops/rnn.pyc in <lambda>()
    117       if time > 0: vs.get_variable_scope().reuse_variables()
    118       # pylint: disable=cell-var-from-loop
--> 119       call_cell = lambda: cell(input_, state)
    120       # pylint: enable=cell-var-from-loop
    121       if sequence_length:

/Library/Python/2.7/site-packages/tensorflow/python/ops/rnn_cell.pyc in __call__(self, inputs, state, scope)
    200       # Parameters of gates are concatenated into one multiply for efficiency.
    201       c, h = array_ops.split(1, 2, state)
--> 202       concat = linear([inputs, h], 4 * self._num_units, True)
    203 
    204       # i = input_gate, j = new_input, f = forget_gate, o = output_gate

/Library/Python/2.7/site-packages/tensorflow/python/ops/rnn_cell.pyc in linear(args, output_size, bias, bias_start, scope)
    700   # Now the computation.
    701   with vs.variable_scope(scope or "Linear"):
--> 702     matrix = vs.get_variable("Matrix", [total_arg_size, output_size])
    703     if len(args) == 1:
    704       res = math_ops.matmul(args[0], matrix)

/Library/Python/2.7/site-packages/tensorflow/python/ops/variable_scope.pyc in get_variable(name, shape, dtype, initializer, trainable, collections)
    254   return get_variable_scope().get_variable(_get_default_variable_store(), name,
    255                                            shape, dtype, initializer,
--> 256                                            trainable, collections)
    257 
    258 

/Library/Python/2.7/site-packages/tensorflow/python/ops/variable_scope.pyc in get_variable(self, var_store, name, shape, dtype, initializer, trainable, collections)
    186     with ops.name_scope(None):
    187       return var_store.get_variable(full_name, shape, dtype, initializer,
--> 188                                     self.reuse, trainable, collections)
    189 
    190 

/Library/Python/2.7/site-packages/tensorflow/python/ops/variable_scope.pyc in get_variable(self, name, shape, dtype, initializer, reuse, trainable, collections)
     99       if should_check and not reuse:
    100         raise ValueError("Over-sharing: Variable %s already exists, disallowed."
--> 101                          " Did you mean to set reuse=True in VarScope?" % name)
    102       found_var = self._vars[name]
    103       if not shape.is_compatible_with(found_var.get_shape()):

ValueError: Over-sharing: Variable forward/RNN/BasicLSTMCell/Linear/Matrix already exists, disallowed. Did you mean to set reuse=True in VarScope?

コードで実際に変数を指定していないため、なぜこのエラーが発生するのかわかりません。変数は rnn および rnn_cell 関数内でのみ作成されます。誰かがこのエラーを修正する方法を教えてもらえますか?

入力が tf.int32 型であるため、現在発生している別のエラーは型エラーですが、LSTM 内で作成された非表示層は tf.float32 型であり、rnn_cell.py コード内の線形関数は連結します。これらの 2 つのテンソルを計算し、それらに重み行列を掛けます。なぜこれが不可能なのでしょうか。入力がワンホットエンコードされて int32 型になるのは比較的一般的だと思います。

一般に、char-rnns のトレーニング時にバッチサイズを 1 標準にするというこのアプローチはありますか? Andrej Karpathy によるコードを見たことがあります。そこでは、彼は基本的な numpy で char-rnn をトレーニングし、同じ手順を使用します。ここでは、長さ 25 のシーケンスでテキスト全体を単純に調べます。コードは次のとおりです: https:// gist.github.com/karpathy/d4dee566867f8291f086

python - Tensorflow の Char-RNN

0 に答える 0

Related

Reference