これは、動的計画法を介して2次3次2次時間で 実行できます。
ここにいくつかのPythonコードがあります:
import sys
import numpy as np
bignum = 10000
S = sys.argv[1] #'AAABBAAABBCECE'
N = len(S)
# length of longest substring match bet s[i:] and s[j:]
maxmatch = np.zeros( (N+1,N+1), dtype=int)
for i in xrange(N-1,-1,-1):
for j in xrange(i+1,N):
if S[i] == S[j]:
maxmatch[i,j] = maxmatch[i+1,j+1]+1
# P[n,k] = cost of encoding first n characters given that last k are a block
P = np.zeros( (N+1,N+1),dtype=int ) + bignum
# Q[n] = cost of encoding first n characters
Q = np.zeros(N+1, dtype=int) + bignum
# base case: no cost for empty string
P[0,0]=0
Q[0]=0
for n in xrange(1,N+1):
for k in xrange(1,n+1):
if n-2*k >= 0:
# s1, s2 = S[n-k:n], S[n-2*k:n-k]
# if s1 == s2:
if maxmatch[n-2*k,n-k] >=k:
# Here we are incrementing the count: C x_1...x_k -> C+1 x_1...x_k
P[n,k] = min(P[n,k], P[n-k,k])
print 'P[%d,%d] = %d' % (n,k,P[n,k])
# Here we are starting a new block: 1 x_1...x_k
P[n,k] = min(P[n,k], Q[n-k] + 1 + k)
print 'P[%d,%d] = %d' % (n,k,P[n,k])
for k in xrange(1,n+1):
Q[n] = min(Q[n], P[n,k])
print
print Q[N]
途中で選択内容を覚えておくことで、実際のエンコーディングを再構築できます。
私は小さなしわを省きました。それは、Cが大きい場合、C+1を保持するために余分なバイトを使用しなければならない可能性があるということです。32ビットintを使用している場合、このアルゴリズムのランタイムが実行可能なコンテキストでは、これは発生しません。スペースを節約するために短いintを使用している場合は、それについて考え、最新のCのサイズに基づいてテーブルに別のディメンションを追加する必要があります。理論的には、これによりlog(N)係数が追加される可能性がありますが、これは実際には明らかではないと思います。
編集:@Moronの利益のために、アルゴリズムが何を考えているかをより簡単に確認できるように、より多くのprintステートメントを含む同じコードを次に示します。
import sys
import numpy as np
bignum = 10000
S = sys.argv[1] #'AAABBAAABBCECE'
N = len(S)
# length of longest substring match bet s[i:] and s[j:]
maxmatch = np.zeros( (N+1,N+1), dtype=int)
for i in xrange(N-1,-1,-1):
for j in xrange(i+1,N):
if S[i] == S[j]:
maxmatch[i,j] = maxmatch[i+1,j+1]+1
# P[n,k] = cost of encoding first n characters given that last k are a block
P = np.zeros( (N+1,N+1),dtype=int ) + bignum
# Q[n] = cost of encoding first n characters
Q = np.zeros(N+1, dtype=int) + bignum
# base case: no cost for empty string
P[0,0]=0
Q[0]=0
for n in xrange(1,N+1):
for k in xrange(1,n+1):
if n-2*k >= 0:
# s1, s2 = S[n-k:n], S[n-2*k:n-k]
# if s1 == s2:
if maxmatch[n-2*k,n-k] >=k:
# Here we are incrementing the count: C x_1...x_k -> C+1 x_1...x_k
P[n,k] = min(P[n,k], P[n-k,k])
print "P[%d,%d] = %d\t I can encode first %d characters of S in only %d characters if I use my solution for P[%d,%d] with %s's count incremented" % (n\
,k,P[n,k],n,P[n-k,k],n-k,k,S[n-k:n])
# Here we are starting a new block: 1 x_1...x_k
P[n,k] = min(P[n,k], Q[n-k] + 1 + k)
print 'P[%d,%d] = %d\t I can encode first %d characters of S in only %d characters if I use my solution for Q[%d] with a new block 1%s' % (n,k,P[n,k],n,Q[\
n-k]+1+k,n-k,S[n-k:n])
for k in xrange(1,n+1):
Q[n] = min(Q[n], P[n,k])
print
print 'Q[%d] = %d\t I can encode first %d characters of S in only %d characters!' % (n,Q[n],n,Q[n])
print
print Q[N]
ABCDABCDABCDBCDでの出力の最後の数行は、次のようになります。
Q[13] = 7 I can encode first 13 characters of S in only 7 characters!
P[14,1] = 9 I can encode first 14 characters of S in only 9 characters if I use my solution for Q[13] with a new block 1C
P[14,2] = 8 I can encode first 14 characters of S in only 8 characters if I use my solution for Q[12] with a new block 1BC
P[14,3] = 13 I can encode first 14 characters of S in only 13 characters if I use my solution for Q[11] with a new block 1DBC
P[14,4] = 13 I can encode first 14 characters of S in only 13 characters if I use my solution for Q[10] with a new block 1CDBC
P[14,5] = 13 I can encode first 14 characters of S in only 13 characters if I use my solution for Q[9] with a new block 1BCDBC
P[14,6] = 12 I can encode first 14 characters of S in only 12 characters if I use my solution for Q[8] with a new block 1ABCDBC
P[14,7] = 16 I can encode first 14 characters of S in only 16 characters if I use my solution for Q[7] with a new block 1DABCDBC
P[14,8] = 16 I can encode first 14 characters of S in only 16 characters if I use my solution for Q[6] with a new block 1CDABCDBC
P[14,9] = 16 I can encode first 14 characters of S in only 16 characters if I use my solution for Q[5] with a new block 1BCDABCDBC
P[14,10] = 16 I can encode first 14 characters of S in only 16 characters if I use my solution for Q[4] with a new block 1ABCDABCDBC
P[14,11] = 16 I can encode first 14 characters of S in only 16 characters if I use my solution for Q[3] with a new block 1DABCDABCDBC
P[14,12] = 16 I can encode first 14 characters of S in only 16 characters if I use my solution for Q[2] with a new block 1CDABCDABCDBC
P[14,13] = 16 I can encode first 14 characters of S in only 16 characters if I use my solution for Q[1] with a new block 1BCDABCDABCDBC
P[14,14] = 15 I can encode first 14 characters of S in only 15 characters if I use my solution for Q[0] with a new block 1ABCDABCDABCDBC
Q[14] = 8 I can encode first 14 characters of S in only 8 characters!
P[15,1] = 10 I can encode first 15 characters of S in only 10 characters if I use my solution for Q[14] with a new block 1D
P[15,2] = 10 I can encode first 15 characters of S in only 10 characters if I use my solution for Q[13] with a new block 1CD
P[15,3] = 11 I can encode first 15 characters of S in only 11 characters if I use my solution for P[12,3] with BCD's count incremented
P[15,3] = 9 I can encode first 15 characters of S in only 9 characters if I use my solution for Q[12] with a new block 1BCD
P[15,4] = 14 I can encode first 15 characters of S in only 14 characters if I use my solution for Q[11] with a new block 1DBCD
P[15,5] = 14 I can encode first 15 characters of S in only 14 characters if I use my solution for Q[10] with a new block 1CDBCD
P[15,6] = 14 I can encode first 15 characters of S in only 14 characters if I use my solution for Q[9] with a new block 1BCDBCD
P[15,7] = 13 I can encode first 15 characters of S in only 13 characters if I use my solution for Q[8] with a new block 1ABCDBCD
P[15,8] = 17 I can encode first 15 characters of S in only 17 characters if I use my solution for Q[7] with a new block 1DABCDBCD
P[15,9] = 17 I can encode first 15 characters of S in only 17 characters if I use my solution for Q[6] with a new block 1CDABCDBCD
P[15,10] = 17 I can encode first 15 characters of S in only 17 characters if I use my solution for Q[5] with a new block 1BCDABCDBCD
P[15,11] = 17 I can encode first 15 characters of S in only 17 characters if I use my solution for Q[4] with a new block 1ABCDABCDBCD
P[15,12] = 17 I can encode first 15 characters of S in only 17 characters if I use my solution for Q[3] with a new block 1DABCDABCDBCD
P[15,13] = 17 I can encode first 15 characters of S in only 17 characters if I use my solution for Q[2] with a new block 1CDABCDABCDBCD
P[15,14] = 17 I can encode first 15 characters of S in only 17 characters if I use my solution for Q[1] with a new block 1BCDABCDABCDBCD
P[15,15] = 16 I can encode first 15 characters of S in only 16 characters if I use my solution for Q[0] with a new block 1ABCDABCDABCDBCD
Q[15] = 9 I can encode first 15 characters of S in only 9 characters!