Reference:
L. Busoniu,
D. Ernst,
B. De Schutter, and
R. Babuska,
"Cross-entropy optimization of control policies with adaptive basis
functions," IEEE Transactions on Systems, Man and Cybernetics,
Part B: Cybernetics, vol. 41, no. 1, pp. 196-209, Feb. 2011.
Abstract:
This paper introduces an algorithm for direct search of control
policies in continuous-state, discrete-action Markov decision
processes. The algorithm looks for the best closed-loop policy that
can be represented using a given number of basis functions (BFs),
where a discrete action is assigned to each BF. The type of the BFs
and their number are specified in advance and determine the complexity
of the representation. Considerable flexibility is achieved by
optimizing the locations and shapes of the BFs, together with the
action assignments. The optimization is carried out with the
cross-entropy method and evaluates the policies by their empirical
return from a representative set of initial states. The return for
each representative state is estimated using Monte Carlo simulations.
The resulting algorithm for cross-entropy policy search with adaptive
BFs is extensively evaluated in problems with two to six state
variables, for which it reliably obtains good policies with only a
small number of BFs. In these experiments, cross-entropy policy search
requires vastly fewer BFs than value-function techniques with
equidistant BFs, and outperforms policy search with a competing
optimization algorithm called DIRECT.