Reinforcement learning (RL) is an extremely promising paradigm for learning optimal control by directly interacting with the system. In contrast to traditional control methods, RL does not require a model and is effective for both linear and nonlinear, deterministic and stochastic systems. An RL controller receives a scalar reward as feedback on its immediate performance, and must learn to maximise the cumulative, longterm reward over the course of interaction. This is done by learning a value function (describing cumulative rewards as a function of the system variables) and/or a control policy.
In order to fully realise the potential of RL, highdimensional problems must be addressed  but unfortunately many current algorithms are limited to lowdimensional problems (lower than 10 dimensions). A core reason behind this is that designing accurate value function or policy parameterizations a priori leads to exponential complexity in the number of dimensions. Moreover, estimating the parameters of such a complex representation requires large amounts of data, thereby leading to slow learning.
This project therefore aims at investigating RL methods that automatically derive computationally efficient, but accurate representations from the data  rather than relying on a priori design. Possibilities include nonparametric approximation (such as kernelbased methods and regression trees), hierarchical or optimised representations, adaptive removal and insertion of basis functions, etc. The endresult should be a class of RL  possibly actor/critic  methods that learn effectively in highdimensional problems (tens of dimensions or more), preferably accompanied by performance guarantees, which are essential in reallife applications. Fundamental research will be combined with challenging case studies, e.g., in robotics, traffic control, multiagent systems etc.
