tensorflow - Tensorflow - TextSum モデル: 独自のトレーニングデータを作成する方法

Question

TextSum モデル用の独自のトレーニングデータを作成しようとしています。私の理解では、記事と要約をバイナリファイル (TFRecords) に入れる必要があります。ただし、生のテキストファイルから独自のトレーニングデータを作成することはできません。フォーマットがよくわからないので、次のコードを使用して非常に単純なバイナリファイルを作成しようとしています。

files = os.listdir(path)
writer = tf.python_io.TFRecordWriter("test_data")
for i, file in enumerate(files):
    content = open(os.path.join(path, file), "r").read()
    example = tf.train.Example(
        features = tf.train.Features(
            feature = {
                'content': tf.train.Feature(bytes_list=tf.train.BytesList(value=[content]))
            }
        )
    )

    serialized = example.SerializeToString()
    writer.write(serialized)

そして、次のコードを使用して、この test_data ファイルの値を読み取ろうとしました

reader = open("test_data", 'rb')
len_bytes = reader.read(8)
str_len = struct.unpack('q', len_bytes)[0]
example_str = struct.unpack('%ds' % str_len, reader.read(str_len))[0]
example_pb2.Example.FromString(example_str)

しかし、私は常に次のエラーが発生します。

  File "dailymail_corpus_to_tfrecords.py", line 34, in check_file
    example_pb2.Example.FromString(example_str)
  File "/home/s1510032/anaconda2/lib/python2.7/site-packages/google/protobuf/internal/python_message.py", line 770, in FromString
    message.MergeFromString(s)
  File "/home/s1510032/anaconda2/lib/python2.7/site-packages/google/protobuf/internal/python_message.py", line 1091, in MergeFromString
    if self._InternalParse(serialized, 0, length) != length:
  File "/home/s1510032/anaconda2/lib/python2.7/site-packages/google/protobuf/internal/python_message.py", line 1117, in InternalParse
    new_pos = local_SkipField(buffer, new_pos, end, tag_bytes)
  File "/home/s1510032/anaconda2/lib/python2.7/site-packages/google/protobuf/internal/decoder.py", line 850, in SkipField
    return WIRETYPE_TO_SKIPPER[wire_type](buffer, pos, end)
  File "/home/s1510032/anaconda2/lib/python2.7/site-packages/google/protobuf/internal/decoder.py", line 791, in _SkipLengthDelimited
    raise _DecodeError('Truncated message.')
google.protobuf.message.DecodeError: Truncated message.

何が悪いのかわかりません。この問題を解決するための提案があれば教えてください。

tensorflow - Tensorflow - TextSum モデル: 独自のトレーニング データを作成する方法

2 に答える 2

Related

Reference

tensorflow - Tensorflow - TextSum モデル: 独自のトレーニングデータを作成する方法