Repository logo

Sparse Bayesian reinforcement learning




Lee, Minwoo, author
Anderson, Charles W., advisor
Ben-Hur, Asa, committee member
Kirby, Michael, committee member
Young, Peter, committee member

Journal Title

Journal ISSN

Volume Title


This dissertation presents knowledge acquisition and retention methods for efficient and robust learning. We propose a framework for learning and memorizing, and we examine how we can use the memory for efficient machine learning. Temporal difference (TD) learning is a core part of reinforcement learning, and it requires function approximation. However, with function approximation, the most popular TD methods such as TD(λ), SARSA, and Q-learning lose stability and diverge especially when the complexity of the problem grows and the sampling distribution is biased. The biased samples cause function approximators such as neural networks to respond quickly to the new data by losing what was previously learned. Systematically selecting a most significant experience, our proposed approach gradually stores the snapshot memory. The memorized snapshots prevent forgetting important samples and increase learning stability. Our sparse Bayesian learning model maintains the sparse snapshot memory for efficiency in computation and memory. The Bayesian model extends and improves TD learning by utilizing the state information in hyperparameters for smart decision of action selection and filtering insignificant experience to maintain sparsity of snapshots for efficiency. The obtained memory can be used to further improve learning. First, the placement of the snapshot memories with a radial basis function kernel located at peaks of the value function approximation surface leads to an efficient way to search a continuous action space for practical application with fine motor control. Second, the memory is a knowledge representation for transfer learning. Transfer learning is a paradigm for knowledge generalization of machine learning and reinforcement learning. Transfer learning shortens the time for machine learning training by using the knowledge gained from similar tasks. The dissertation examines a practice approach that transfers the snapshots from non-goal-directive random movements to goal-driven reinforcement learning tasks. Experiments are described that demonstrate the stability and efficiency of learning in 1) traditional benchmark problems and 2) the octopus arm control problem without limiting or discretizing the action space.


Zip file contains supplementary video.

Rights Access


continuous action space
sparse learning
knowledge retention
Bayesian learning
reinforcement learning


Associated Publications