R.R. Negenborn, B. De Schutter, M.A. Wiering, and J. Hellendoorn, "Experience-based model predictive control using reinforcement learning," Proceedings of the 8th TRAIL Congress 2004 - A World of Transport, Infrastructure and Logistics - CD-ROM, Rotterdam, The Netherlands, 18 pp., Nov. 2004.
Model predictive control (MPC) is becoming an increasingly popular method to select actions for controlling dynamic systems. Traditionally MPC uses a model of the system to be controlled and a performance function to characterize the desired behavior of the system. The MPC agent finds actions over a finite horizon that lead the system into a desired direction. A significant problem with conventional MPC is the amount of computations required and suboptimality of chosen actions. In this paper we propose the use of MPC to control systems that can be described as Markov decision processes. We discuss how a straightforward MPC algorithm for Markov decision processes can be implemented, and how it can be improved in terms of speed and decision quality by considering value functions. We propose the use of reinforcement learning techniques to let the agent incorporate experience from the interaction with the system in its decision making. This experience speeds up the decision making of the agent significantly. Also, it allows the agent to base its decisions on an infinite instead of finite horizon. The proposed approach can be beneficial for any system that can be modeled as Markov decision process, including systems found in areas like logistics, traffic control, and vehicle automation.