Efficient Reinforcement Learning with Bayesian Optimization
- Author(s): Ganjali, Danyan
- Advisor(s): Sideris, Athanasios
- et al.
A probabilistic reinforcement learning algorithm is presented for finding control policies in continuous state and action spaces without a prior knowledge of the dynamics. The objective of this algorithm is to learn from minimal amount of interaction with the environment in order to maximize a notion of reward, i.e. a numerical measure of the quality of the resulting state trajectories. Experience from the interactions are used to construct a set of probabilistic Gaussian process (GP) models that predict the resulting state trajectories and the reward from executing a policy on the system. These predictions are used with a technique known as Bayesian optimization to search for policies that promise higher rewards. As more experience is gathered, predictions are made with more confidence and the search for better policies relies less on new interactions with the environment.
The computational demand of a GP makes it eventually impractical to use as the number of observations from interacting with the environment increase. Moreover, using a single GP to model different regions that may exhibit disparate behaviors can produce unsatisfactory representations and predictions. One way of mitigating these issues is by partitioning the observation points into different regions each represented by a local GP. With the sequential arrival of the observation points from new experiences, it is necessary to have an adaptive clustering method that can partition the data into an appropriate number of regions. This led to the development of EM+ algorithm presented in the second part of this work, which is an extension to the Expectation Maximization (EM) for the Gaussian mixture models, that assumes no prior knowledge of the number of components.
Lastly, an application of the EM+ algorithm to filtering problems is presented. We propose a filtering algorithm that combines the advantages of the well-known particle filter and the mixture of Gaussian filter, while avoiding their issues.