[1]
|
L. P. Kaelbling, M. L. Littman and A. W. Moore, “Rein-forcement Learning: A Survey,” Journal of Artificial In-telligence Research, Vol. 4, 1996, pp. 237-285.
|
[2]
|
R. S. Sutton and A. G. Barto, “Reinforcement Learning: An Introduction,” The MIT press, Cambridge MA, 1998.
|
[3]
|
R. S. Sutton, “Learning to Predict by the Methods of Temporal Differences,” Machine Learning, Vol. 3, 1988, pp. 9-44.
|
[4]
|
J. B. Pollack and A. D. Blair, “Why Did TD-Gammon Work,” In: D. S. Touretzky, M. C. Mozer and M. E. Has-selmo, Ed., Advances in Neural Information Processing Systems 8, MIT Press, Cambridge MA, 1996, pp. 10-16.
|
[5]
|
D. B. Fogel, “Evolving a Checkers Player without Rely-ing on Human Experience,” Intelligence, Vol. 11, No. 2, 2000, p. 217.
|
[6]
|
D. E. Moriarty, “Symbiotic Evolution of Neural Net-works in Sequential Decision Tasks,” PhD thesis, De-partment of Computer Sciences, The University of Texas at Austin, USA, 1997.
|
[7]
|
G. Tesauro, “Practical Issues in Temporal Difference Learning,” In: D. S. Lippman, J. E. Moody and D. S. Touretzky, Ed., Advances in Neural Information Proc-essing Systems 4, Morgan Kaufmann, San Mateo, CA, 1992, pp. 259-266.
|
[8]
|
G. J. Tesauro, “Temporal Difference Learning and TD-Gammon,” Communications of the ACM, Vol. 38, 1995, pp. 58-68.
|
[9]
|
S. Thrun, “Learning to Play the Game of Chess,” In: G. Te-sauro, D. Touretzky and T. Leen, Ed., Advances in Neural Information Processing Systems 7, Morgan Kaufmann, San Fransisco, CA, 1995, pp. 1069-1076.
|
[10]
|
J. Baxter, A. Tridgell and L. Weaver, “Knightcap: A Chess Program that Learns by Combining TD(λ) with Minimax Search,” Technical report, Australian National University, Canberra, 1997.
|
[11]
|
A. L. Samuel, “Some Studies in Machine Learning Using the Game of Checkers,” IBM Journal on Research and Development, Vol. 3, No. 3, 1959, pp. 210-229.
|
[12]
|
A. L. Samuel, “Some Studies in Machine Learning Using the Game of Checkers II—Recent Progress,” IBM Jour-nal on Research and Development, Vol. 11, No. 6, 1967, pp. 601-617.
|
[13]
|
J. Schaeffer, M. Hlynka and V. Hussila, “Temporal Dif-ference Learning Applied to a High-Performance Game,” In Seventeenth International Joint Conference on Artifi-cial Intelligence, Seattle, WA, USA, 2001, pp. 529-534.
|
[14]
|
N. N. Schraudolph, P. Dayan and T. J. Sejnowski, “Tem-poral Difference Learning of Position Evaluation in the Game of Go,” In: J. D. Cowan, G. Tesauro and J. Al-spector, Ed., Advances in Neural Information Processing Systems, Morgan Kaufmann, San Francisco, CA, 1994, pp. 817-824.
|
[15]
|
J. Furnkranz, “Machine Learning in Games: A Survey,” In: J. Furnkranz and M. Kubat, Ed., Machines that learn to Play Games, Nova Science Publishers, Huntington,NY, 2001, pp. 11-59.
|
[16]
|
A. Plaat, “Research Re: search and Re-search,” PhD the-sis, Erasmus University Rotterdam, Holland, 1996.
|
[17]
|
J. Schaeffer, “The Games Computers (and People) Play,” Advances in Computers, Vol. 50, 2000, pp. 189-266.
|
[18]
|
J. Schaeffer and A. Plaat, “Kasparov versus Deep Blue: The Re-match,” International Computer Chess Association Journal, Vol. 20, No. 2, 1997, pp. 95-102.
|
[19]
|
R. Coulom, “Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search,” Proceedings of the fifth International Conference on Computers and Games, Turin, Italy, 2006, pp. 72-83.
|
[20]
|
R. Bellman, “Dynamic Programming,” Princeton Univer-sity Press, USA, 1957.
|
[21]
|
J. N. Tsitsiklis, “Asynchronous Stochastic Approximation and Q-learning,” Machine Learning, Vol. 16, 1994, pp. 185-202.
|
[22]
|
A. G. Barto, R. S. Sutton and C. W. Anderson, “Neu-ronlike Adaptive Elements that Can Solve Difficult Learning Control Problems,” IEEE Transactions on Systems, Man and Cybernetics, Vol. 13, 1983, pp. 834-846.
|
[23]
|
M. A. Wiering and J. H. Schmidhuber, “Fast Online Q(λ),” Machine Learning, Vol. 33, No. 1, 1998, pp. 105- 116.
|
[24]
|
C. M. Bishop, “Neural Networks for Pattern Recognition,” Oxford University, New York, 1995.
|
[25]
|
D. E. Rumelhart, G. E. Hinton and R. J. Williams, “Learning Internal Representations by Error Propagation,” In: D. E. Rumelhart and J. L. Mcclelland, Ed., Parallel Distributed Processing, MIT Press, USA, 1986, pp. 318-362.
|
[26]
|
L.-J. Lin, “Reinforcement Learning for Robots Using Neural Networks,” PhD thesis, Carnegie Mellon Univer-sity, Pittsburgh, 1993.
|
[27]
|
H. Berliner, “Experiences in Evaluation with BKG—A Program that Plays Backgammon,” In Proceedings of the International Joint Conference on Artificial Intelligence, Vol. 1, 1977, pp. 428-433.
|
[28]
|
J. A. Boyan, “Modular Neural Networks for Learning Context-Dependent Game Strategies,” Master’s thesis, University of Chicago, USA, 1992.
|
[29]
|
R. Caruana and V. R. de Sa, “Promoting Poor Features to Supervisors: Some Inputs Work Better as Outputs,” In M. C. Mozer, M. I. Jordan and T. Petsche, Ed., Advances in Neural Information Processing Systems 9, Morgan Kauf-mann, San Mateo, CA, 1997, pp.246-252.
|
[30]
|
A. Sperduti and A. Starita, “Speed up Learning and Network Optimization with Extended Backpropagation,” Neural Networks, Vol. 6, 1993, pp. 365-383.
|
[31]
|
X. Xu, D. Hu and X. Lu, “Kernel-Based Least Squares Policy Iteration for Reinforcement Learning,” IEEE Transactions on Neural Networks, Vol. 18, No. 4, 2007, pp. 973-992.
|
[32]
|
W. Smart and L. Kaelbling, “Effective Reinforcement Learning for Mobile Robots,” Proceedings of the IEEE International Conference on Robotics and Automation, Washington, DC, USA, 2002, pp. 3404-3410.
|