A Definition of Non-Stationary Bandits
Yueyang Liu, Stanford University
2023-07-03 15:00:00 ~ 2023-07-03 16:30:00
Despite the subject of non-stationary bandit learning having attracted much recent attention, we have yet to identify a formal definition of non-stationarity that can consistently distinguish non-stationary bandits from stationary ones. Prior work has characterized non-stationary bandits as bandits for which the reward distribution changes over time. We demonstrate that this definition can ambiguously classify the same bandit as both stationary and non-stationary; this ambiguity arises in the existing definition's dependence on the latent sequence of reward distributions. In addition, the definition has given rise to two widely used notions of regret: the dynamic regret and the weak regret. These notions are not indicative of qualitative agent performance in some bandits that are intuitively nearly-stationary, i.e., non-stationary bandits that closely resemble stationary ones. In addition, this definition of non-stationary bandits has led to the design of agents that explore excessively. We introduce a formal definition of non-stationary bandits that resolves these issues. In addition, for two bandits that provide agents with indistinguishable experiences, our definition classifies them as both stationary or both non-stationary. Our definition also applies seamlessly to both the Bayesian and the frequentist formulations of bandits, providing a unified approach.
Yueyang Liu is a fifth-year PhD candidate at Stanford University, where she is part of the Operations Research group in the Department of Management Science and Engineering. Her interest lies in continual learning. She is advised by Benjamin Van Roy and Kuang Xu.