|Reinforcement learning (RL) is an extremely promising paradigm for learning optimal control by directly interacting with the system. In contrast to traditional control methods, RL does not require a model and is effective for both linear and nonlinear, deterministic and stochastic systems. An RL controller receives a scalar reward as feedback on its immediate performance, and must learn to maximise the cumulative, long-term reward over the course of interaction. This is done by learning a value function (describing cumulative rewards as a function of the system variables) and/or a control policy.
In order to fully realise the potential of RL, high-dimensional problems must be addressed - but unfortunately many current algorithms are limited to low-dimensional problems (lower than 10 dimensions). A core reason behind this is that designing accurate value function or policy parameterizations a priori leads to exponential complexity in the number of dimensions. Moreover, estimating the parameters of such a complex representation requires large amounts of data, thereby leading to slow learning.
This project therefore aims at investigating RL methods that automatically derive computationally efficient, but accurate representations from the data - rather than relying on a priori design. Possibilities include nonparametric approximation (such as kernel-based methods and regression trees), hierarchical or optimised representations, adaptive removal and insertion of basis functions, etc. The end-result should be a class of RL - possibly actor/critic - methods that learn effectively in high-dimensional problems (tens of dimensions or more), preferably accompanied by performance guarantees, which are essential in real-life applications. Fundamental research will be combined with challenging case studies, e.g., in robotics, traffic control, multi-agent systems etc.