1

I am new to reinforcement learning and doing a project about chess. I use neural network and temporal difference learning to train the engine learn the game.

The neural network has one input layer (of 385 features), two hidden layers and one output layer, whose range is [-1,1] where -1 means lose and 1 win (0 draw). I use TD-lambda to self-learn the chess, and the default case is to only consider next 10 moves. All the weights are initialized in the range [-1, 1].

I use forward propagation to estimate the value of state, but most of the values are very close to either 1 or -1, even the result is draw, which I think the engine doesn't learn well. I think some values are large and dominate the result, changing small weights does no help. I change the size of two hidden layers but it doesn't work (However, I have tried a toy example with small size and dimension, it can converge and the estimated value is very close to the target one after dozens of iteration). I don't know how to fix this, could someone give me some advice?

Thank you.

Some references are listed below

4

0 に答える 0