Reference:
R.R. Negenborn,
B. De Schutter,
M.A. Wiering, and
H. Hellendoorn,
"Learning-based model predictive control for Markov decision
processes," Proceedings of the 16th IFAC World Congress,
Prague, Czech Republic, pp. 354-359, July 2005.
Abstract:
We propose the use of Model Predictive Control (MPC) for controlling
systems described by Markov decision processes. First, we consider a
straightforward MPC algorithm for Markov decision processes. Then, we
propose value functions, a means to deal with issues arising in
conventional MPC, e.g., computational requirements and sub-optimality
of actions. We use reinforcement learning to let an MPC agent learn a
value function incrementally. The agent incorporates experience from
the interaction with the system in its decision making. Our approach
initially relies on pure MPC. Over time, as experience increases, the
learned value function is taken more and more into account. This
speeds up the decision making, allows decisions to be made over an
infinite instead of a finite horizon, and provides adequate control
actions, even if the system and desired performance slowly vary over
time.