Presenter: Andy Haupt
Topic: Cautious Bandits
The EconCS Group holds an Economics and Computer Science research seminar each semester.
Abstract: We introduce and characterize revealed risk preferences of bandit algorithms. An algorithm for the stochastic bandit problem is risk averse if for any fixed noise levels and time, there is a reward difference such that the algorithm chooses a less risky arm over a higher expected reward risky arm, with high probability in time. We experimentally find that several classical adversarial and stochastic bandit algorithms (eps-Greedy, UCB, EXP3) and prove that eps-Greedy is risk-averse. We discuss implications for the separation of learning and deployment of reinforcement learning algorithms and discuss extensions of our statement to mean-based bandit algorithms (Braverman et al. 2018) and to multi-agent environments.