c++ - パターンによって制約される最長の一般的な部分文字列

Question

問題：

私は3つの文字列s1、s2、s3を持っています。それぞれの両側にガベージテキストが含まれ、中央に定義パターンがありますtext1+number1。number1文字列ごとに2ずつ増加します。抽出したいtext1+number1。

私はすでに見つけるためのコードを書いていますnumber1

LCS関数を拡張してtext1を取得するにはどうすればよいですか？

#include <iostream>

const std::string longestCommonSubstring(int, std::string const& s1, std::string const& s2, std::string const& s3);

int main(void) {
    std::string s1="hello 5", s2="bolo 7", s3="lo 9sdf";
    std::cout << "Trying to get \"lo 5\", actual result: \"" << longestCommonSubstring(5, s1, s2, s3) << '\"';
}

const std::string longestCommonSubstring(int must_include, std::string const& s1, std::string const& s2, std::string const& s3) {
    std::string longest;

    for(size_t start=0, length=1; start + length <= s1.size();) {
        std::string tmp = s1.substr(start, length);
        if (std::string::npos != s2.find(tmp) && std::string::npos != s3.find(tmp)) {
            tmp.swap(longest);
            ++length;
        } else ++start;
    }

    return longest;
}

例：

から"hello 5"、、取得"bolo 7"し"lo 9sdf"たい "lo 5"

コード：

簡単なLCS関数（テストケース）を書くことができましたが、この変更された関数を書くのに問題があります。

score 1 · Accepted Answer

パターン*n、* n + 2、* n + 4などを探しているとします。次の文字列があります：s1 = "hello 1、bye 2、ciao 1"、s2 = "hello 3、さようなら4、ciao 2"とs3="こんにちは5、さようなら6、ciao5"。次に、次のようにします。

//find all pattern sequences
N1 = findAllPatterns(s1, number);
 for i = 2 to n:
  for item in Ni-1:
   for match in findAllPatterns(si, nextPattern(item))
    Ni.add([item, (match, indexOf(match))]);

//for all pattern sequences identify the max common substring
maxCommonLength = 0; 
for sequence in Nn:
 temp = findLCS(sequence);
 if(length(temp[0]) > maxCommonLength):
  maxCommonLength = length(temp[0]);
  result = temp;

return result;

`アルゴリズムの最初の部分は、シーケンスを識別します：[（1、6）、（3、6）、（5、6）]、[（1、19）、（3、6）、（5、6） ]、[（2、12）、（4、12）、（6、12）]

2番目の部分では、パターンに一致する最長のサブストリングとして["hello 1"、 "hello 3"、"hello5"]を識別します。

アルゴリズムは、2つの部分を組み合わせて、パターンに一致するが最適ではない初期のシーケンスを破棄することでさらに最適化できますが、わかりやすくするために2つの部分に分けて表示することをお勧めします。

-固定コードブロックを編集します

score 0 · Accepted Answer

すでに知っnumber1ていて、これらの数字がすべて対応する文字列に1回だけ表示されることがわかっている場合は、次のように機能するはずです。

あなたの文字列を、、などと呼びますs[0]。s[1]セットlongest = INT_MAX。各文字列s[i]（i> = 0）に対して、次のようになります。

number1 + 2 * iで発生する場所を見つけますs[i]。位置で発生するとしますj。
If（i == 0）j0 = j; そうしないと
- for（k = 1; k <= j &&k<=最長&&s[i] [j --k] == s [0] [j0 --k]; ++ k）{}
- 最長=k;

最後に、longestは、すべての文字列に共通する最長の部分文字列の長さになります。

s1基本的に、番号を見つけたポイントから逆方向にスキャンし、（my ）内の対応する文字との不一致を探し、s[0]これまでに一致する最長のサブストリングを追跡します。longestこれは、私たちが見る新しい文字列ごとに同じか減少します。

score 0 · Accepted Answer

LCSアルゴリズムの内部を変更しようとするのではなく、その出力を取得してs1で見つけることができます。そこから、あなたの番号は出力の長さプラス1のオフセットに配置されます。

score 0 · Accepted Answer

私自身の解決策を書きました：

#include <iostream>
#include <string>
#include <sstream>
#include <vector>

typedef std::pair<std::pair<std::string, std::string>, std::pair<std::pair<std::string, std::string>, std::pair<std::string, std::string>>> pairStringTrio;
typedef std::pair<std::string,std::pair<std::string,std::string>> stringPairString;

stringPairString longestCommonSubstring(const pairStringTrio&);
std::string strFindReplace(const std::string&, const std::string&, const std::string&);

int main(void) {
        std::string s1= "6 HUMAN ACTIONb", s2="8 HUMAN ACTIONd", s3="10 HUMAN ACTIONf";
        pairStringTrio result = std::make_pair(std::make_pair(s1, "6"), std::make_pair(std::make_pair(s2, "8"), std::make_pair(s3, "10")));

        stringPairString answer = longestCommonSubstring(result);
        std::cout << '\"' << answer.first << "\"\t\"" << answer.second.first << "\"\t\"" << answer.second.second << '\"';
}


stringPairString longestCommonSubstring(const pairStringTrio &foo) {
        std::string longest;

        for(size_t start=0, length=foo.first.first.size()-1; start + length <= foo.first.first.size();) {
                std::string s1_tmp = foo.first.first.substr(start, length);
                std::string s2_tmp = strFindReplace(s1_tmp, foo.first.second, foo.second.first.second);
                std::string s3_tmp = strFindReplace(s1_tmp, foo.first.second, foo.second.second.second);

                if (std::string::npos != foo.second.first.first.find(s2_tmp) && std::string::npos != foo.second.second.first.find(s3_tmp)) {
                        s1_tmp.swap(longest);
                        ++length;
                } else ++start;
        }

        return std::make_pair(longest, std::make_pair(strFindReplace(longest, foo.first.second, foo.second.first.second), strFindReplace(longest, foo.first.second, foo.second.second.second)));
}

std::string strFindReplace(const std::string &original, const std::string& src, const std::string& dest) {
        std::string answer=original;
        for(std::size_t pos = 0; (pos = answer.find(src, pos)) != answer.npos;)
                answer.replace(pos, src.size(), dest);
        return answer;
}

c++ - パターンによって制約される最長の一般的な部分文字列

問題：

例：

コード：

4 に答える 4

Related

Reference