Policy Search with Cross-Entropy Optimization of Basis Functions
Reference
L. Buşoniu,
D. Ernst,
B. De Schutter, and
R. Babuška,
"Policy Search with Cross-Entropy Optimization of Basis Functions,"
Proceedings of the 2009 IEEE Symposium on Adaptive
Dynamic Programming and Reinforcement Learning (ADPRL 2009),
Nashville, Tennessee, pp. 153-160, Mar.-Apr. 2009.
Abstract
This paper introduces a novel algorithm for approximate policy search
in continuous-state, discrete-action Markov Decision Process (MDP).
Previous policy search approaches have typically used ad-hoc
parameterizations developed for specific MDPs. In contrast, the novel
algorithm employs a flexible policy parameterization, suitable for
solving general discrete-action MDPs. The algorithm looks for the best
closed-loop policy that can be represented using a given number of
basis functions, where a discrete action is assigned to each basis
function. The locations and shapes of the basis functions are
optimized, together with the action assignments. This allows a large
class of policies to be represented. The optimization is carried out
with the cross-entropy method and evaluates the policies by their
empirical return from a representative set of initial states. We
report simulation experiments in which the algorithm reliably obtains
good policies with only a small number of basis functions, albeit at
sizable computational costs.
Downloads
- Corresponding technical report:
pdf
file
(400 KB)
Bibtex entry
@inproceedings{BusErn:08-031,
author={L. Bu{\c{s}}oniu and D. Ernst and B. {D}e Schutter and R.
Babu{\v{s}}ka},
title={Policy Search with Cross-Entropy Optimization of Basis Functions},
booktitle={Proceedings of the 2009 IEEE Symposium on Adaptive Dynamic
Programming and Reinforcement Learning (ADPRL 2009)},
address={Nashville, Tennessee},
pages={153--160},
month=mar # {--} # apr,
year={2009}
}
This page is maintained by Bart De Schutter.
Last update: February 21, 2026.