Reference:
L. Busoniu,
B. De Schutter,
R. Babuska, and
D. Ernst,
"Using prior knowledge to accelerate online least-squares policy
iteration," Proceedings of the 2010 IEEE International Conference
on Automation, Quality and Testing, Robotics (AQTR 2010),
Cluj-Napoca, Romania, 6 pp., May 2010. Paper A-S2-1/3005.
Abstract:
Reinforcement learning (RL) is a promising paradigm for learning
optimal control. Although RL is generally envisioned as working
without any prior knowledge about the system, such knowledge is often
available and can be exploited to great advantage. In this paper, we
consider prior knowledge about the monotonicity of the control policy
with respect to the system states, and we introduce an approach that
exploits this type of prior knowledge to accelerate a state-of-the-art
RL algorithm called online least-squares policy iteration (LSPI).
Monotonic policies are appropriate for important classes of systems
appearing in control applications. LSPI is a data-efficient RL
algorithm that we previously extended to online learning, but that did
not provide until now a way to use prior knowledge about the policy.
In an empirical evaluation, online LSPI with prior knowledge learns
much faster and more reliably than the original online LSPI.