This MSc project will advance the class of online planning algorithms for nonlinear optimal control. At each discrete time step, these algorithms look at the current system state and employ a model of the system to predict its response to various sequences of actions. We will approach the problem from the reinforcement learning perspective, where the performance is measured by a cumulative reward signal, which must be maximized -- see figure.
In particular, the class of optimistic planning algorithms will be considered, which explore more promising sequences of actions first, so that a near-optimal action is found after a given number of predictions. Unfortunately, in their basic variants, optimistic planning algorithms discard the planning data right after using it to choose the current action, and then have to start over from scratch at the next step, thereby wasting great amounts of computation.
The main goal of this project is to design sound and efficient ways of reusing knowledge in optimistic planning. Thus, rather than just plan, the resulting methods will also learn. The data can be reused either in an exhaustive, low-level form, or in the condensed form of a function approximator, which synthesizes the reward knowledge obtained so far. The resulting method for knowledge reuse will have to satisfy two main requirements:
- Significantly lower the computational cost of the original method, without sacrificing performance (even better, possibly increasing it). This will be the main benefit of reusing knowledge instead of always planning from scratch.
- Preserve the theoretical guarantees of near-optimality of optimistic planning.
The reduced computational cost will enable the application of these methods to real-time control, among other things. Possible applications include robotics (such as the robot arm pictured above), games, as well as advanced (simulated) applications to medicine.
This project takes places in the framework of an ongoing cooperation with INRIA Lille, team SequeL -- see https://sequel.lille.inria.fr/, and includes possible visits to Lille. Prof. R. Babuska will supervise in Delft, with L. Busoniu serving as contact to Lille, and R. Munos offering input from the INRIA side.