Non-asymptotic Analysis of Reinforcement Learning Algorithms with Function Approximation
Speaker
Shaofeng Zou, University at Buffalo
Time
2019-06-24 10:00:00 ~ 2019-06-24 11:30:00
Location
Room 3-412, SEIEE Building
Host
Haiming Jin, Assistant Professor, John Hopcroft Center for Computer Science
Abstract
Reinforcement learning has been successful in finding policies that attain as large an accumulative reward as possible over time. Even though the asymptotic convergence of many reinforcement learning algorithms, e.g., temporal difference (TD), Q-learning, and SARSA, have been established, how fast these algorithms converge still remains to be solved. Such a non-asymptotic understanding will suggest how the parameters of the algorithms and the underlying Markov decision process affect the convergence rate, and will further facilitate the design of faster reinforcement learning algorithms.
In this talk, we present our recent results on non-asymptotic analysis of several reinforcement algorithms, including SARSA and TDC with linear function approximation. The major challenge in the analysis lies in the characterization of the stochastic bias, which is due to the non-i.i.d. nature of the data. We design novel techniques that enable non-asymptotic analysis of a type of reinforcement learning algorithms with dynamically changing behavior policy, which includes SARSA as a special case.
Bio
Dr. Shaofeng Zou received the Ph.D. degree in Electrical and Computer Engineering from Syracuse University in 2016. He received the B.E. degree (with honors) from Shanghai Jiao Tong University, Shanghai, China, in 2011. He was a postdoctoral research associate at the Coordinated Science Lab, University of Illinois at Urbana-Champaign during 2016-2018.
He joined the Department of Electrical Engineering, University at Buffalo in 2018, where he is currently an Assistant Professor. Dr. Zou's research interests include statistical signal processing, machine learning, and information theory.