python - アルゴリズムの実行時間の複雑さ-これを計算してアルゴリズムをさらに最適化するにはどうすればよいですか？

Question

再帰的アルゴリズムを設計し、Pythonで書き留めました。さまざまなパラメータで実行時間を測定すると、指数関数的な時間がかかるようです。さらに; 50などの小さな数字で終了するのに30分以上かかります（終了するまで待ちませんでしたが、妥当な時間で終了しないようです。指数関数的だと思います）。

したがって、このアルゴリズムの実行時間の複雑さに興味があります。誰かが方程式T（n、m）を導き出すのを手伝ってくれませんか？または、ビッグオーを計算するには？

アルゴリズムは以下のとおりです。

# parameters:
# search string, the index where we left on the search string, source string, index where we left on the source string,
# and the indexes array, which keeps track of the indexes found for the characters
def find(search, searchIndex, source, sourceIndex, indexes):
    found = None
    if searchIndex < len(search): # if we haven't reached the end of the search string yet
        found = False
        while sourceIndex < len(source): # loop thru the source, from where we left off
            if search[searchIndex] == source[sourceIndex]: # if there is a character match
                # recursively look for the next character of search string 
                # to see if it can be found in the remaining part of the source string
                if find(search, searchIndex + 1, source, sourceIndex + 1, indexes):
                    # we have found it
                    found = True # set found = true
                    # if an index for the character in search string has never been found before.
                    # i.e if this is the first time we are finding a place for that current character
                    if indexes[searchIndex] is None:
                        indexes[searchIndex] = sourceIndex # set the index where a match is found
                    # otherwise, if an index has been set before but it's different from what
                    # we are trying to set right now. so that character can be at multiple places.
                    elif indexes[searchIndex] != sourceIndex: 
                        indexes[searchIndex] = -1 # then set it to -1.
            # increment sourceIndex at each iteration so as to look for the remaining part of the source string. 
            sourceIndex = sourceIndex + 1
    return found if found is not None else True

def theCards(N, colors):
    # allcards: a list 1..N of characters where allcards[i] is 'R' if i is a prime number, 'B' otherwise.
    # so in this example where N=7, allcards=['B','R','R','B','R','B','R']
    allcards = ['R' if isPrime(i) else 'B' for i in range(1, N + 1)]
    # indexes is initially None.
    indexes = [None] * len(colors)

    find(colors, 0, allcards, 0, indexes)
    return indexes    

if __name__ == "__main__":
    print theCards(7, list("BBB"))

最悪の場合の実行時間を導出するために問題とアルゴリズムを理解する必要があるかどうかはわかりませんが、これが私が解決しようとした問題です。

問題：

ソース文字列SRCと検索文字列SEAが与えられた場合、SRCでサブシーケンスSEAを見つけ、SRCでSEAの各文字が見つかった場所のインデックスを返します。SEAの文字がSRCの複数の場所にある可能性がある場合は、その文字の位置に-1を返します。

例えば; ソース文字列がBRRBRBR（N = 7）で、検索文字列がBBBの場合、「BBB」の最初の「B」は検索文字列のインデックス0に表示されます。2番目の「B」は検索文字列のインデックス3にあり、最後の「B」は5番目の位置にあります。さらに; 文字「BBB」の位置に他の選択肢はないため、アルゴリズムは[0,3,5]を返します。

別のケースでは、ソース文字列がBRRBRB（N = 6）で、検索文字列がRBRの場合、「RBR」の最初の「R」は位置1または2になります。これにより、「B」と位置の位置3のみが残ります。最後の「R」は4。次に、最初の「R」は複数の場所にある可能性があり、その場所はあいまいです。他の2つのキャラクター、BとRは1つの場所しかありません。したがって、アルゴリズムは[-1,4,5]を返します。

]（N = 58）であり、検索文字列はRBRRBRBBRBRRBBRRBBBRRBBBRRです。[-1、-1、-1、-1、-1、-1、-1、17、18、19、23、-1、-1、-1、-1、-1を返す必要があります、-1、-1、-1、-1、-1、-1、47、53]、しかし残念ながらそうではありません=（

最適化：

'indexes'リストが完全に-1でいっぱいになったときに、検索を停止することを考えました。しかし、それは最良の場合（またはおそらく平均的な場合）にのみ影響し、最悪の場合には影響しません。このアルゴリズムをさらに最適化するにはどうすればよいでしょうか。この問題には多項式の解が存在することを私は知っています。

最適化よりも重要なのは、実行時間のT（n、m）方程式に本当に興味があります。ここで、nとmはソース文字列と検索文字列の長さです。

ここまで読めた方、ありがとうございました！=）

編集-実装されたIVIadのソリューション：

def find2(search, source):
    indexes = list()
    last = 0
    for ch in search:
        if last >= len(source):
            break
        while last < len(source) and source[last] != ch:
            last = last + 1
        indexes.append(last)
        last = last + 1
    return indexes

def theCards(N, colors):
    # allcards: a list 1..N of characters where allcards[i] is 'R' if i is a prime number, 'B' otherwise.
    allcards = ['R' if isPrime(i) else 'B' for i in range(1, N + 1)]

    indexes = find2(colors, allcards) # find the indexes of the first occurrences of the characters
    colors.reverse() # now reverse both strings
    allcards.reverse()
    # and find the indexes of the first occurrences of the characters, again, but in reversed order
    indexesreversed = find2(colors, allcards)
    indexesreversed.reverse() # reverse back the resulting list of indexes 
    indexesreversed = [len(allcards) - i - 1 for i in indexesreversed] # fix the indices

    # return -1 if the indices are different when strings are reversed
    return [indexes[i] + 1 if indexes[i] == indexesreversed[i] else - 1 for i in range(0, len(indexes))]

if __name__ == "__main__":
    print theCards(495, list("RBRRBRBBRBRRBBRRBBBRRBBBRR"))

score 4 · Accepted Answer

で実行できますO(n + m)。ここmで、はの文字数SEA、および：nの文字数です。SRC

last = 1
for i = 1 to m do
    while SRC[last] != SEA[i]
        ++last

    print last
    ++last (skip this match)

基本的に、の各文字についてSEA、SRCの位置を見つけますが、前の文字を見つけた位置の後にのみスキャンします。

例えば; ソース文字列がBRRBRBR（N = 7）で、検索文字列がBBBの場合

次に：find Bin SRC：found at last = 1 print 1、set last = 2。

検索：で検索B、印刷、設定。SRClast = 44last = 5

検索：で検索B、印刷、設定。終わり。SRClast = 66last = 7

あなたのアルゴリズムの複雑さに関しては、私は非常に正式な分析を提供することはできませんが、私がそれをどのように行うかを説明しようと思います。

SRCすべての文字が、およびそれらの間で等しいと仮定しSEAます。したがって、whileループの状態を排除できます。nまた、whileループの実行回数にも注意してください。

最初の文字については、を呼び出すことに注意してくださいfind(1, 1), ... find(m, n)。ただし、これらはwhileループを開始し、独自の再帰呼び出しを行います。それぞれfind(i, j)がO(m)、whileループでforの再帰呼び出しを行いi = 1 to nます。しかし、これらは順番に、より再帰的な呼び出しを行い、指数関数的な複雑さを引き起こす一種の「雪崩効果」をもたらします。

だからあなたは持っています：

i = 1: calls find(2, 2), find(3, 3), ..., find(m, n)
       find(2, 2) calls find(3, 3), ..., find(m, n)
       find(3, 3) calls find(4, 4), ..., find(m, n)
       find(4, 4) calls find(5, 5), ..., find(m, n)
       ...
       total calls: O(m^m)
i = 2: same, but start from find(2, 3).
...
i = n: same

したがって、全体の複雑さはのようになりますO(n*m^m)。これが理にかなっていることを願っています。間違いはありません。

score 3 · Accepted Answer

これは、最も長い一般的なサブシーケンスの問題です。これは動的計画法で実装でき、指数関数的な時間を大幅に短縮できます。LCSがSEAの長さを返す場合、シーケンスSEAがSRCに存在することがわかります。アルゴリズムを変更するときに、インデックスを保存するのは簡単です。ここに良い説明へのリンクがあります。http://en.wikipedia.org/wiki/Longest_common_subsequence_problem

score 0 · Accepted Answer

簡単に見て、あなたは検索、再帰、バックトラックしていますか？

ソース文字列の接尾辞配列を作成するのは良い考えだと思います。接尾辞配列の作成には、O（nlogn）の複雑さがあります
。サブストリングの検索には、O（logn）時間計算量があります。

python - アルゴリズムの実行時間の複雑さ-これを計算してアルゴリズムをさらに最適化するにはどうすればよいですか？

3 に答える 3

Related

Reference