Reinforcement Learning Meets Bilevel Optimization: Learning Leader-Follower Games with Sample Efficiency

Date: 

Friday, April 5, 2024, 1:30pm to 2:30pm

Location: 

SEC 1.413

Speaker: Zhuoran Yang

Title: Reinforcement Learning Meets Bilevel Optimization: Learning Leader-Follower Games with Sample Efficiency

Abstract: In this talk, I will introduce methods that modify the optimism principle for reinforcement learning in leader-follower games, especially when the follower's reward function is unknown. Such problems generally face statistical challenges due to the ill-posed nature of the best response function. I will discuss two cases that overcome these challenges. The first involves a fully rational follower with a separable reward function, where we use an algorithm combining optimism with pessimistic binary search to identify the follower's indifference curve. In the second case, for a boundedly rational follower defined by entropy regularization, we directly estimate the response model and establish a bonus function for estimation uncertainty. This approach leads to optimism-based online reinforcement learning algorithms that achieve sublinear regret upper bounds, effectively learning the leader's optimal policy in both scenarios.

Bio: Zhuoran Yang is an Assistant Professor of Statistics and Data Science at Yale University, starting in July 2022. His research interests lie in the interface between machine learning, statistics, and optimization. He is particularly interested in the foundations of reinforcement learning, representation learning, and deep learning. Before joining Yale, Zhuoran worked as a postdoctoral researcher at the University of California, Berkeley, advised by Michael. I. Jordan. Prior to that, he obtained his Ph.D. from the Department of Operations Research and Financial Engineering at Princeton University, co-advised by Jianqing Fan and Han Liu. He received his bachelor’s degree in Mathematics from Tsinghua University in 2015.