Controlling Cooperation in Strategic Environments

Date: 

Friday, December 9, 2022, 1:00pm to 2:00pm

Location: 

Friday at 1pm in SEC 1.413 streamed via Zoom at the link: https://harvard.zoom.us/j/95184948637?pwd=bXBIc2U5MEZ0QmRUb01WQ0o0SXRCdz09

This Friday, 1pm-2pm in SEC 1.413, we will be having our final EconCS seminar of the semester, featuring Andreas Haupt from MIT. He will be speaking in-person on:

Controlling Cooperation in Strategic Environments

Abstract:

In many strategic environments, a particular level of cooperation is desired: Cooperation between countries in climate action is considered desirable, while cooperation between sellers in a market is often considered collusion, and undesirable. This set of two projects asks how to control the level of cooperation between selfish reinforcement learning agents in the absence of a mediator concerting agents’ behavior. It limits its analysis to repeated games, while many of the ideas generalize to more general domains.

A first project (joint work with Alessandro Bonatti) considers optimal cooperation. We first provide modeling of selfishness, and use the regret framework for this. Under adversarial (swap) no-regret, algorithms cannot cooperate, in that play approaches the set of correlated equilibria of the stage game. Under an adaptive (swap) no-regret notion we define, there are policy profiles that optimally cooperate in a wide class of games. The notion is selective in that the currently most-studied model for a colluding agent, tabular Q-learning, incurs adaptive regret.

A second set of algorithms (joint work with Olivia Hartzell and Dylan Hadfield-Menell) considers limited cooperation. The main assumption is that repeated play of a Nash equilibrium leads to a continuation value that is constant in the current agent action. We propose the Lipschitz parameter of the continuation value function for each agent as a measure of cooperation. This notion can be audited without communicating the algorithm. It is also possible to adapt many policy gradient and online algorithms to learn policy profiles with Lipschitz continuous continuation values, using projected gradient descent. This leads to provable limits of cooperation. We accompany this by simulations in several economic domains.

Here is the Zoom link again for those who are not able to make it in-person:

https://harvard.zoom.us/j/95184948637?pwd=bXBIc2U5MEZ0QmRUb01WQ0o0SXRCdz09