c++ - CNTK の大きなファイルに対して CTF Reader がエラーをスローする

Question

Github の CNTK チュートリアルに従って、CTF リーダー関数を使用しています。

def create_reader(path, is_training, input_dim, label_dim):
    return MinibatchSource(CTFDeserializer(path, StreamDefs(
        features = StreamDef(field='x', shape=input_dim, is_sparse=True),
        labels = StreamDef(field='y', shape=label_dim, is_sparse=False)
    )), randomize=is_training, epoch_size= INFINITELY_REPEAT if is_training else FULL_DATA_SWEEP)

これは、入力ファイルのサイズが特定のサイズ (不明) より大きい場合を除いて、完全に正常に機能します。次に、次のようなエラーをスローします。

WARNING: Sparse index value (269) at offset 8923303 in the input file (C:\local\CNTK-2-0-beta6-0-Windows-64bit-CPU-Only\cntk\Examples\common\data_pos_train_balanced_ctf.txt) exceeds the maximum expected value (268).
attempt: Reached the maximum number of allowed errors while reading the input file (C:\local\CNTK-2-0-beta6-0-Windows-64bit-CPU-Only\cntk\Examples\common\data_pos_train_balanced_ctf.txt)., retrying 2-th time out of 5...
.
.
.

RuntimeError: Reached the maximum number of allowed errors while reading the input file (C:\local\CNTK-2-0-beta6-0-Windows-64bit-CPU-Only\cntk\Examples\common\data_pos_train_balanced_ctf.txt).

この種のエラーがファイル TextParser.cpp でスローされていることを確認しました https://github.com/Microsoft/CNTK/blob/5633e79febe1dc5147149af9190ad1944742328a/Source/Readers/CNTKTextFormatReader/TextParser.cpp

これに対する解決策または回避策は何ですか?

score 2 · Accepted Answer

入力の次元を知る必要があり、インデックスが 0 から始まることも知っておく必要があります。したがって、語彙を 1 から 20000 の範囲にマッピングする入力ファイルを作成した場合、次元は 20001 になります。

c++ - CNTK の大きなファイルに対して CTF Reader がエラーをスローする

1 に答える 1

Related

Reference