0

I am trying to implement q-learning for tictactoe. One of the steps in doing so involves enumerating all the possible states of the tictactoe board to form a state-value table. I have written a procedure to recursively generate all possible states starting from the empty board. To do this, I am performing implicitly the pre-order traversal of the search space tree. However, at the end of it all, I am getting only 707 unique states whereas the general consensus is that the number of legal states is around 5000.

Note: I am referring to the number of legal states. I am aware that the number of states is closer to 19,000 if either player is allowed to continue playing after a game is completed (what I mean by an illegal state).

CODE:

def generate_state_value_table(self, state, turn):

    winner = int(is_game_over(state)) #check if, for the current turn and state, game has finished and if so who won
    #print "\nWinner is ", winner
    #print "\nBoard at turn: ", turn
    #print_board(state) 
    self.add_state(state, winner/2 + 0.5) #add the current state with the appropriate value to the state table
    open_cells = open_spots(state) #find the index (from 0 to total no. of cells) of all the empty cells in the board 

    #check if there are any empty cells in the board
    if len(open_cells) > 0:
        for cell in open_cells:
            #pdb.set_trace()
            row, col = cell / len(state), cell % len(state)
            new_state = deepcopy(state) #make a copy of the current state 

            #check which player's turn it is 
            if turn % 2 == 0:
                new_state[row][col] = 1
            else:
                new_state[row][col] = -1        

            #using a try block because recursive depth may be exceeded
            try:
                #check if the new state has not been generated somewhere else in the search tree
                if not self.check_duplicates(new_state):
                    self.generate_state_value_table(new_state, turn+1)
                else:
                    return
            except:
                #print "Recursive depth exceeded"
                exit()

    else:
        return

You can look at the full code here if you want.

EDIT: I tidied the code up a bit both in the link and here with more comments to make things clearer. Hope that helps.

4

1 に答える 1

0

だから私は最終的に問題を解決し、同様の問題に直面している人のためにこの答えを出しています. バグは、重複した状態を処理する方法にありました。生成された新しい状態が検索ツリーのどこかより前に生成された場合、それは状態テーブルに追加されるべきではありませんが、私が犯した間違いは、重複する状態を見つける際に事前注文トラバーサルを短縮することでした。 1。

簡単に言えば、以下のコードからelse句を削除すると、州の数が6046になりました。

#check if the new state has not been generated somewhere else in the search tree
if not self.check_duplicates(new_state):
    self.generate_state_value_table(new_state, turn+1)
else:
    return

さらに、明確な勝者がいる状態に遭遇した時点で、検索ツリー ブランチの探索を停止しました。具体的には、 の後に次のコードを追加しましたself.add_state(state, winner/2 + 0.5)

#check if the winner returned is one of the players and go back to the previous state if so
if winner != 0:
    return

これにより、私が探していた州の数が 5762 になりました。

于 2016-11-21T02:49:37.480 に答える