Home

A Definition of Non-Stationary Bandits


Speaker

Yueyang Liu, Stanford University

Time

2023-07-03 15:00:00 ~ 2023-07-03 16:30:00

Location

电信群楼3-404会议室

Host

李帅

Abstract

Despite the subject of non-stationary bandit learning having attracted much recent attention, we have yet to identify a formal definition of non-stationarity that can consistently distinguish non-stationary bandits from stationary ones. Prior work has characterized non-stationary bandits as bandits for which the reward distribution changes over time. We demonstrate that this definition can ambiguously classify the same bandit as both stationary and non-stationary;  this ambiguity arises in the existing definition's dependence on the latent sequence of reward distributions. In addition, the definition has given rise to two widely used notions of regret: the dynamic regret and the weak regret. These notions are not indicative of qualitative agent performance in some bandits that are intuitively nearly-stationary, i.e., non-stationary bandits that closely resemble stationary ones. In addition, this definition of non-stationary bandits has led to the design of agents that explore excessively. We introduce a formal definition of non-stationary bandits that resolves these issues. In addition, for two bandits that provide agents with indistinguishable experiences, our definition classifies them as both stationary or both non-stationary. Our definition also applies seamlessly to both the Bayesian and the frequentist formulations of bandits, providing a unified approach. 

Bio

Yueyang Liu is a fifth-year PhD candidate at Stanford University, where she is part of the Operations Research group in the Department of Management Science and Engineering. Her interest lies in continual learning. She is advised by Benjamin Van Roy and Kuang Xu. 

© John Hopcroft Center for Computer Science, Shanghai Jiao Tong University
分享到

地址:上海市东川路800号上海交通大学软件大楼专家楼
邮箱:jhc@sjtu.edu.cn 电话:021-54740299
邮编:200240