string - 最長共通回文サブシーケンス

Question

指定された 2 つの文字列の最長共通回文サブシーケンスの長さをカウントする効率的なアルゴリズムはありますか?

例えば：

文字列 1。 afbcdfca

文字列 2。 bcadfcgyfka

LCPS は 5 で、LCPS 文字列はafcfaです。

score 5 · Accepted Answer

はい。

3 つ以上のシーケンスに対して LCS のアルゴリズムを使用するだけです。

私が間違っていなければ、

 LCS( afbcdfca, acfdcbfa, bcadfcgyfka, akfygcfdacb) = afcfa

文字列 #2 と #4 を理解するのはあなた次第です。

更新: いいえ、反例があります: LCS( aba, aba, bab, bab ) = ab, ba. LCS は、サブシーケンスが回文になることを保証しません。おそらく、この制約を追加する必要があります。とにかく、LCS の再帰方程式は良い出発点です。

ジェネレータスタイルで LCS を実装し、長さ n、n-1、n-2 などのすべての LCS を生成する場合、LCS-gen(string1, string1,逆文字列 1)、LCS 世代 (文字列 2、逆文字列 2)。しかし、非常に効率的なLCS-genがあるかどうかはまだ確認していません。

score 0 · Accepted Answer

これは非常に一般的であり、ほとんどの場合、人々は動的プログラミングの部分の 70% を説明し、厄介な詳細で停止するため、これは私の簡単なウォークスルーです。

最適部分構造:X[0..n-1]を長さ n の入力シーケンスとL(0, n-1)し、最長回文サブシーケンスの長さをX[0..n-1].

X の最後の文字と最初の文字が同じ場合、L(0, n-1) = L(1, n-2) + 2. なぜ、2 番目と最後から 2 番目の文字が同じでない場合、最後と最初の bing が同じでは役に立たないのではないでしょうか。いいえ、この「サブシーケンス」は連続している必要はありません。

/* Driver program to test above functions */
int main()
{
    char seq[] = "panamamanap";
    int n = strlen(seq);
    printf ("The lnegth of the LPS is %d", lps(seq, 0, n-1));
    getchar();
    return 0;
}

int lps(char *seq, int i, int j)
{   
    // Base Case 1: If there is only 1 character   
    if (i == j)     
        return 1;    
    
    // Base Case 2: If there are only 2 characters and both are same   
    if (seq[i] == seq[j] && i + 1 == j)     
        return 2;    
    
    // If the first and last characters match   
    if (seq[i] == seq[j])      
        return lps (seq, i+1, j-1) + 2;    
    
    // If the first and last characters do not match   
    else return max( lps(seq, i, j-1), lps(seq, i+1, j) );
}

上記の実装を考慮して、すべて異なる文字を持つ長さ 6 のシーケンスの部分再帰ツリーを次に示します。

               L(0, 5)
             /        \ 
            /          \  
        L(1,5)          L(0,4)
       /    \            /    \
      /      \          /      \
  L(2,5)    L(1,4)  L(1,4)  L(0,3)

上記の部分再帰ツリーでL(1, 4)は、が 2 回解決されています。完全な再帰木を描くと、何度も何度も解かれる部分問題がたくさんあることがわかります。他の典型的な動的計画法 (DP) の問題と同様に、一時的な配列L[][]をボトムアップ方式で構築することにより、同じサブ問題の再計算を回避できます。

// Returns the length of the longest palindromic subsequence in seq
int lps(char *str)
{
   int n = strlen(str);
   int i, j, cl;
   int L[n][n];  // Create a table to store results of subproblems
 
 
   // Strings of length 1 are palindrome of length 1
   for (i = 0; i < n; i++)
      L[i][i] = 1;
 
    for (cl=2; cl<=n; cl++)                              //again this is the length of chain we are considering
    {
        for (i=0; i<n-cl+1; i++)                         //start at i
        {
            j = i+cl-1;                                  //end at j
            if (str[i] == str[j] && cl == 2)             //if only 2 characters and they are the same then set L[i][j] = 2
               L[i][j] = 2;
            else if (str[i] == str[j])                   //if greater than length 2 and first and last characters match, add 2 to the calculated value of the center stripped of both ends
               L[i][j] = L[i+1][j-1] + 2;
            else
               L[i][j] = max(L[i][j-1], L[i+1][j]);      //if not match, then take max of 2 possibilities
        }
    }
 
    return L[0][n-1];
}

これは論理的には非動的プログラミングと同じです。結果を配列に保存するだけなので、同じことを何度も計算する必要はありません。

score 0 · Accepted Answer

この問題と同じです: http://code.google.com/codejam/contest/1781488/dashboard#s=p2

http://code.google.com/codejam/contest/1781488/dashboard#s=a&a=2

以下のコードでは、cd(s) メソッドを提供します (これは、文字列内の次または前の文字がその文字と等しい場所を示す char dict を返します)。これを使用して、動的プログラミングソリューションを実装し、サンプルコードも提供します。

動的計画法では、len(s1)*len(s1)/2 の状態しかないため、order(N^2) が可能です。

アイデアは、s1 から char を取得するか、取得しないかのいずれかです。s1 からイワナを取り出す場合は、(s1 と s2 の両方の) 前と後ろから取る必要があります。取得しない場合は、1 に進みます。(これは、s2 の状態が s1 の状態と常に同期していることを意味します。これは、常に両方の外側から貪欲に取得するためです。つまり、s1 が持つことができる状態の数だけを心配する必要があります）。

このコードは、ほとんどの場合に役立ちます。cd1 (char dict 1) は、現在の文字に等しい s1 内の次の文字を見つけるのに役立ちます (前方と後方の両方)。

再帰的な solve() メソッドでは、start1、end1 などを決定する必要があります。キャラクターを取得するたびに合計に2を追加します（start1 == end1の場合を除き、1を追加します）

s1 = "afbcdfca"
s2 = "bcadfcgyfka"

def cd(s):
    """returns dictionary d where d[i] = j where j is the next occurrence of character i"""
    char_dict = {}
    last_pos = {}
    for i, char in enumerate(s):
        if char in char_dict:
            _, forward, backward = char_dict[char]
            pos = last_pos[char]
            forward[pos] = i
            backward[i] = pos
            last_pos[char] = i
        else:
            first, forward, backward = i, {}, {}
            char_dict[char] = (first, forward, backward)
            last_pos[char] = i
    return char_dict

print cd(s1)
"""{'a': ({0: 7}, {7: 0}), 'c': ({3: 6}, {6: 3}), 'b': ({}, {}), 'd': ({}, {}), 'f': ({1: 5}, {5: 1})}"""

cd1, cd2 = cd(s1), cd(s2)

cache = {}
def solve(start1, end1, start2, end2):
    state = (start1, end1)
    answer = cache.get(state, None)
    if answer:
        return answer
    
    if start1 < end1:
        return 0
    c1s, c1e = s1[start1], s1[end1]
    c2s, c2e = s2[start2], s2[end2]
    
    #if any of c1s, c1e, c2s, c2e are equal and you don't take, you must
    #skip over those characters too:
    dont_take_end1 = solve(start1, end1 - 1, start2, end2)
    
    do_take_end1 = 2
    if do_take_end1:
        end1_char = s1[end1]
        #start1 = next character along from start1 that equals end1_char
        #end1 = next char before end1 that equals end1_char
        #end2 = next char before end2 that..
        #start2 = next char after .. that ..
        do_take_end1 += solve(start1, end1, start2, end2)
    
    
    answer = cache[state] = max(dont_take_end1, do_take_end1)
    return answer
    
print solve(0, len(s1), 0, len(s2))

string - 最長共通回文サブシーケンス

3 に答える 3

Related

Reference