Cautious Bandits

Date: 

Friday, October 15, 2021, 1:00pm to 2:30pm

Presenter:  Andy Haupt
Topic:  Cautious Bandits

Fall 2021

The EconCS Group holds an Economics and Computer Science research seminar each semester.

Fall 2021 meetings are held at 1 - 2:30 PM on Fridays. Seminar Coordinators for Fall '21 are Srivatsa R Sai and Daniel Halpern.

Abstract: We introduce and characterize revealed risk preferences of bandit algorithms. An algorithm for the stochastic bandit problem is risk averse if for any fixed noise levels and time, there is a reward difference such that the algorithm chooses a less risky arm over a higher expected reward risky arm, with high probability in time. We experimentally find that several classical adversarial and stochastic bandit algorithms (eps-Greedy, UCB, EXP3) and prove that eps-Greedy is risk-averse. We discuss implications for the separation of learning and deployment of reinforcement learning algorithms and discuss extensions of our statement to mean-based bandit algorithms (Braverman et al. 2018) and to multi-agent environments.