algorithm - LCP 配列を使用したパターンマッチングのアルゴリズムを理解する

Question

序文: 私の質問は主にアルゴリズムに関する質問なので、接尾辞と LCP 配列に慣れていなくても、おそらく助けてくれるでしょう。

このホワイトペーパーでは、文字列パターンマッチングにサフィックスと LCP 配列を効率的に使用する方法について説明します。

SA と LCP の動作と、アルゴリズムの実行時間を (パターンの長さと文字列の長さ) から (Chris Eelmaa の回答 here と jogojapans answer here のおかげで) に改善O(P*log(N))するP方法NをO(P+log(N))理解しました。

LLcpとの使用法を説明する図 4 のアルゴリズムを試してみましたRLcp。しかし、それがどのように機能するかを理解するのに問題があります。

アルゴリズム（ソースから取得）：

パターンマッチングアルゴリズム

使用される変数名の説明:

lcp(v,w) : Length of the longest common prefix of v and w
W = w0..wP-1 : pattern of length P
A = a0..aN-1 : the text (length N)
Pos[0..N-1] : suffix array
L_W : index (in A) of first occurrence of the matched pattern
M : middle index of current substring
L : lower bound
R : upper bound
Lcp : array of size N-2 such that Lcp[M] = lcp(A_Pos[L_M], A_pos[M]) where L_M is the lower bound of the unique interval with M in the middle
Rcp : array of size N-2 such that Rcp[M] = lcp(A_Pos[R_M], A_pos[M]) where R_M is the upper bound of the unique interval with M in the middle

ここで、次の例を使用してアルゴリズムを試してみたいと思います (一部はここから取得)。

 SA | LCP | Suffix entry
 -----------------------
 5  | N/A | a  
 3  | 1   | ana  
 1  | 3   | anana  
 0  | 0   | banana  
 4  | 0   | na  
 2  | 2   | nana 

A = "banana" ; N = 6
W = "ban" ; P = 3

たとえば、文字列を一致させようとbanすると、アルゴリズムが 0 をとして返すことが期待されL_Wます。

アルゴリズムをステップ実行する方法は次のとおりです。

l = lcp("a", "ban") = 0
r = lcp("nana", "ban") = 0
if 0 = 3 or 'b' =< 'a' then // which is NOT the case for both conditions
    L_W = 0
else if 0 < 3 or 'b' =< 'n' then // which is the case for both conditions 
    L_W = 6 // which means 'not found'
...
...

何かが欠けているような気がしますが、何がわかりません。また、を呼び出す代わりに、事前計算された LCP 配列をどのように使用できるか疑問に思っていますlcp(v,w)。

score 1 · Accepted Answer

エラーがあったと思います。

最初の条件はかなり理解しやすいです。LCPの長さ==パターンの長さになったら完了です。パターンが最小のものよりもさらに小さいか等しい場合、唯一の選択肢は最小のものです。

2 番目の条件は間違っています。矛盾によって証明できます。r < P || Wr <= a... つまり r >= P && Wr > a... r >= P の場合、既に r の長さの共通プレフィックスがあるため、どうすれば Lw = N(見つかりません) を得ることができますか?

algorithm - LCP 配列を使用したパターン マッチングのアルゴリズムを理解する

1 に答える 1

Related

Reference

algorithm - LCP 配列を使用したパターンマッチングのアルゴリズムを理解する