数独を解く Python プログラムを作成しましたが、それを高速化したいと考えています。コア操作は、再帰として実装する深さ優先検索関数です。深さ優先検索手順を、データ スタック (単なるリスト) を管理するループとして書き直すことで、再帰のオーバーヘッドをなくすことができると考えました。しかし、パフォーマンスはそれほど変わらないことがわかりました。ループ バージョンは手続き的に再帰バージョンと同等であると確信しています。私が使用しているデータ スタックのプッシュ/ポッピングが高速ではないためですか? それとも、Python インタープリターは本当にハードウェアをうまく活用しているのでしょうか? 誰かが何ができるかを理解するのを手伝ってもらえますか? プロファイリングを添付します。
search_naked は再帰関数です。
2446265 function calls (1813295 primitive calls) in 4.576 seconds
Ordered by: internal time, cumulative time
List reduced from 31 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
709255/79779 2.136 0.000 2.136 0.000 sdkx.py:251(search_naked)
3514 1.539 0.000 3.994 0.001 sdkx.py:106(propagate)
3514/20 0.239 0.000 4.389 0.219 sdkx.py:234(dfs)
20 0.135 0.007 4.528 0.226 sdkx.py:468(solvestring)
3901 0.098 0.000 0.113 0.000 sdkx.py:82(Assume)
467283 0.071 0.000 0.071 0.000 {method 'add' of 'set' objects}
3514 0.050 0.000 0.050 0.000 {method 'copy' of 'dict' objects}
378200 0.048 0.000 0.048 0.000 {method 'discard' of 'set' objects}
1 0.047 0.047 4.576 4.576 sdkx.py:487(solvefile)
そして、これが対応するループです - search_naked3
2825303 function calls (2821809 primitive calls) in 4.367 seconds
Ordered by: internal time, cumulative time
List reduced from 32 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
144657 1.967 0.000 2.154 0.000 sdkx.py:148(search_naked3)
3514 1.285 0.000 3.760 0.001 sdkx.py:106(propagate)
3514/20 0.240 0.000 4.156 0.208 sdkx.py:221(dfs)
443513 0.114 0.000 0.114 0.000 {method 'pop' of 'list' objects}
20 0.109 0.005 4.269 0.213 sdkx.py:455(solvestring)
1 0.097 0.097 4.367 4.367 sdkx.py:474(solvefile)
3901 0.096 0.000 0.112 0.000 sdkx.py:82(Assume)
467283 0.072 0.000 0.072 0.000 {method 'add' of 'set' objects}
[編集] ここに 2 つの関数を添付します。
def search_naked(Ls, LsL, Tmask, D, NLID,N, locs, masks): #Find the smallest naked subgroup.
#global naked, naked_mask, M_size
LsL1, K = LsL+1, LsL+N
for nextLID in xrange(NLID, N):
Tmask1 = Tmask|masks[nextLID]
D1=d_counter[Tmask1]
if LsL1==D1:
if D1 <N:
return Ls+locs[nextLID], Tmask1, D1
return 0,0,0
if LsL1>D1:
return False
if D1+nextLID > K:# or D1>=M_size:
continue
result = search_naked(Ls+locs[nextLID], LsL1, Tmask1, D1, nextLID+1,N, locs, masks)
if result!=(0,0,0):
return result
return 0,0,0
def search_naked3(locs, masks, N): #Find a subgroup in a deep first manner without recussion. counter is the stack for DFS purpose.
counter, i, Len, bm,K = [],0, 1,0,N-1
#i: the current index of a cell to be augmentally considered and checked with for subgroup.
#bm: base mask is the ORed value of previously selected masks, this is used with the current cell to generate a new mask for checking purpose
#counter: the stack structure storing (i,bm) pairs, which are used when returning from DFS
#Len: this is a derived value, as len(counter)+1, Len is maintained to avoid calling len() function.
#At each stage, there are len(counter)+1 cells are being combined consdered/checked as a subgroup, including len(counter) previous cells and
#the current cell indicated by i;
#K=N-1 does not change over the while loop; so save same redundant computation.
while True:
TM = bm|masks[i] #TM (total mask): the mask to check
D=d_counter[TM] #D the number of ones in TM.
if D == Len: #That is if a subgroup is formed if D==Len<N; or we are checking the entire group (D==Len==N)
return (0,0,0) if D==N else (locs[i]+sum([locs[c] for (c,m) in counter]), TM, D) #return a subgroup, else, return the flag that no subgroup is found
#From now on, D is assumed > Len. Situation of D<Len is impossible due to the treatment before the loop.
if D ==N or D - Len > K-i: #Current combination of cells is "blown": either N digits have be provided by less than N locations; or
#there is not enought number (K-i) of unconsidered cells that could possiblly accommodate D-Len digits.
#If blown, we need to check the next cell, unless cell i is already the last of the N cells.
if i<N-1: #If there is a next cell, simple increase i, and continue with while loop. Len, bm should not change,
i+=1
continue
else: #Now threat the sitation where i is already pointing at the last of N cells.
if Len-1: #If there is still something in the stack (indicated by Len-1),
i,bm =counter.pop() #pop from the stack to go back to one level higher, using the older base mask
i+=1 #and try the next cell at that higher level.
Len-=1 #Of course, the we are considering 1 less cells.
continue #continue with the while loop
break #If there is nothing left in the stack -> all combinations have been checked. The only exit of the while loop.
#From now on: Len cells contributed D digits, Len<D and not blown. Prepare to go one level deeper, unless the current cell is already the last of N cells.
if i< N-1: #this is the situation in which, the current cell indicated by i is not the last of N cells,
counter.append((i,bm)) # push to stack
bm=TM #base mask is updated as the current total mask
i+=1 #The firt cell to consider in the deeper level is the one just next to the current cell of this level.
Len+=1 #of course, we are considering a more cell.
continue #continue with the while loop.
#Now, we treat the situation in which the current combination of cells already include the last of N cells, ie, i==N-1.
#We cannot go deeper, therefore we return to a higher level.
i,bm =counter.pop() # Pop the stack, returning to the higher level, recovering base mask
i+=1 # and the next cell to consider in the higher level.
Len-=1 #Of course, we are considering 1 less cells in the combination.
return 0,0,0 #The loop must be bronken! That means all combinations have been considered without finding a subgroup. Return the flag