Reference:
L. Busoniu,
B. De Schutter, and
R. Babuska,
"Approximate dynamic programming and reinforcement learning," in
Interactive Collaborative Information Systems (R. Babuska and
F.C.A. Groen, eds.), vol. 281 of Studies in Computational
Intelligence, Berlin, Germany: Springer, ISBN 978-3-642-11687-2,
pp. 3-44, 2010.
Abstract:
Dynamic Programming (DP) and Reinforcement Learning (RL) can be used
to address problems from a variety of fields, including automatic
control, artificial intelligence, operations research, and economy.
Many problems in these fields are described by continuous variables,
whereas DP and RL can find exact solutions only in the discrete case.
Therefore, approximation is essential in practical DP and RL. This
chapter provides an in-depth review of the literature on approximate
DP and RL in large or continuous-space, infinite-horizon problems.
Value iteration, policy iteration, and policy search approaches are
presented in turn. Model-based (DP) as well as online and batch
model-free (RL) algorithms are discussed. We review theoretical
guarantees on the approximate solutions produced by these algorithms.
Numerical examples illustrate the behavior of several representative
algorithms in practice. Techniques to automatically derive value
function approximators are discussed, and a comparison between value
iteration, policy iteration, and policy search is provided. The
chapter closes with a discussion of open issues and promising research
directions in approximate DP and RL.