Shixiang (Shane) Gu, University of Cambridge
May 28, 2018, Mon, 13:00-14:00
Deep reinforcement learning (RL) has shown promising results for learning complex sequential decision-making behaviors in various environments from computer games, the game of Go, to simulated humanoids. However, most successes have been exclusively in simulation, and results in real-world applications such as robotics are limited, largely due to poor sample efficiency of typical deep RL algorithm and other challenges. In the first part of the talk, I present our work to improve performance and sample efficiency of the core RL algorithms, blurring the boundaries among classic model-based RL, off-policy and on-policy model-free RL. I discuss Q-Prop and Interpolated Policy Gradient (IPG), control variate techniques for policy gradient for variance reduction, and temporal difference models (TDM), a model-based algorithm using a model-free generalized Q-function. In the latter part, I illustrate other practical challenges for enabling autonomous learning agents in the real world, particularly that current RL formulations require constant human interventions for safety, resets, and reward engineering, and do not scale to learn diverse skills. I present our recent work to address those challenges and show pathways to achieve continually learning robots.
Shixiang (Shane) Gu is a PhD candidate at University of Cambridge and Max Planck Institute for Intelligent Systems, where he is co-supervised by Richard E. Turner, Zoubin Ghahramani, and Bernhard Schoelkopf. He holds BASc. in Engineering Science from University of Toronto, where he completed this thesis with Geoffrey Hinton. His research interests span deep reinforcement learning, deep learning, robotics, approximate inference and causality, and his research has been featured by MIT Technology Review and Google Research Blog. He also collaborates closely with Sergey Levine from UC Berkeley/Google Brain and Tim Lillicrap from DeepMind. He will start as a research scientist at Google Brain in the summer 2018.