The wisdom of the crowd: reliable deep reinforcement learning through ensembles of Q-functions

Elliott, Daniel L., authorAnderson, Charles W., advisorDraper, Bruce, committee memberKirby, Michael, committee memberChong, Edwin, committee memberThe wisdom of the crowd: reliable deep reinforcement learning through ensembles of Q-functionsColorado State University. Libraries2018machine learningQ-learningensemblereinforcement learningneural networksMy UniversityMy University2018-09-102018-09-102018engTexthttps://hdl.handle.net/10217/191477https://doi.org/10.25675/3.022700born digitaldoctoral dissertationsCopyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright.Reinforcement learning agents learn by exploring the environment and then exploiting what they have learned. This frees the human trainers from having to know the preferred action or intrinsic value of each encountered state. The cost of this freedom is reinforcement learning can feel too slow and unstable during learning: exhibiting performance like that of a randomly initialized Q-function just a few parameter updates after solving the task. We explore the possibility that ensemble methods can remedy these shortcomings and do so by investigating a novel technique which harnesses the wisdom of the crowds by bagging Q-function approximator estimates. Our results show that this proposed approach improves all tasks and reinforcement learning approaches attempted. We are able to demonstrate that this is a direct result of the increased stability of the action portion of the state-action-value function used by Q-learning to select actions and by policy gradient methods to train the policy. Recently developed methods attempt to solve these RL challenges at the cost of increasing the number of interactions with the environment by several orders of magnitude. On the other hand, the proposed approach has little downside for inclusion: it addresses RL challenges while reducing the number interactions with the environment.