string - KMP の前処理テーブルに関する混乱

Question

KMP アルゴリズムを調べて、接尾辞と接頭辞のカウントのテーブルを計算する KMP の特定の行に関して混乱しています。

アルゴリズム kmp_table: 入力: 文字の配列、W (分析される単語) 整数の配列、T (入力されるテーブル) 出力: なし (ただし、操作中にテーブルにデータが入力されます)

define variables:
    an integer, pos ← 2 (the current position we are computing in T)
    an integer, cnd ← 0 (the zero-based index in W of the next 
    character of the current candidate substring)

(the first few values are fixed but different from what the algorithm 
might suggest)
let T[0] ← -1, T[1] ← 0

while pos is less than the length of W, do:
    (first case: the substring continues)
    if W[pos - 1] = W[cnd], 
      let cnd ← cnd + 1, T[pos] ← cnd, pos ← pos + 1

    (second case: it doesn't, but we can fall back)
    otherwise, if cnd > 0, let cnd ← T[cnd]

    (third case: we have run out of candidates.  Note cnd = 0)
    otherwise, let T[pos] ← 0, pos ← pos + 1

上記はウィキペディアから直接取得したものです。cnd > 0なぜ set なのか、少し混乱しcnd := T[cnd]ています。もう一度やり直すかのように cnd を 0 にリセットしてはいけませんか?

score 0 · Accepted Answer

明らかに、T[0] = -1に設定cndすると、次の反復でT[cnd = 0] = -1読み取ることW[cnd = -1]になり、それは文字列の外側にあります。少なくともその理由から、cnd > 0vsの特別な扱いが必要cnd == 0です。

cnd0 と比較する本当の理由は、の左側に部分的な文字列の一致がある場合に巻き戻すT[cnd]位置を与えるはずだからです。ただし、の左側には何もないため、この目的には使用できません。W[]W[cnd]T[0]W[0]

cnd := T[cnd] を設定するのはなぜですか?

アルゴリズムの要点全体が欠けています。部分一致の後で位置 0 から再開すると、ナイーブアルゴリズムに戻ります。には巻き戻し位置が含まれており、とのすぐ下T[]のサンプルテーブルからわかるように、常に 0 であるとは限りません。そのため、位置 0 までずっと戻るのではなく、他の位置に移動してそこからマッチングを続けることがあります。そしてそれが、単純なアルゴリズムよりもスケーラブルなアルゴリズムになっています。W[]T[]

string - KMP の前処理テーブルに関する混乱

1 に答える 1

Related

Reference