No-swap regret in repeated games with unknown constraints ‒ UPKAMGARPOUR ‐ EPFL

Motivation:

Several real-world multi-agent systems such as electricity markets, traffic routing, and multi-robot applications can be described as repeated games with unknown constraints. In electricity markets, there are capacity constraints on the load of each grid line. In multi-robot applications in unexplored environments, the robots have to accomplish a task while avoiding collision with obstacles and other robots. For such games with unknown constraints, it is crucial to develop learning algorithms that not only aim at achieving agents’ objectives but also ensure constraint satisfaction. It is also important to understand what long-term equilibrium behavior emerges when all agents apply such learning algorithms. Previous work [3] derives a learning algorithm with an upper bound on the regret1 and the cumulative constraint violation. Furthermore, it was also shown that if all agents follow a no-regret algorithm then a coarse correlated equilibria (CCE) is approached. Coarse correlated equilibria suffer from the drawback that it may not always be a rational outcome for an agent [5]. In other words, agents may be choosing a strategy that is strictly dominated by another strategy. The notion of correlated equilibria (CE) alleviates this problem. It is further known that if all agents follow a no-swap regret algorithm then a correlated equilibria is approached. Therefore, in this master’s thesis, we propose to develop a learning algorithm with an upper bound on the swap regret which also bounds the cumulative constraint violation.

Outline:

In a first step, you will start by familiarizing yourself with the relevant literature on swap regret (see e.g. [1, 4]) and on repeated games (with constraints) (see e.g. [3]). Maddux and Kamgarpour [3] established a connection between playing a game with unknown constraints and the sleeping expert problem (see e.g. [2]) which they leveraged to develop a no-regret algorithm with guarantees on the constraint violation. To leverage this connection, your next goal is to extend an existing no-swap regret algorithm to ensure that it also achieves no-swap regret in the sleeping expert setting. Following a similar analysis as in [3], you will then develop a no-swap regret algorithm with guarantees on the constraint violations. Depending on your own interest and background you can then either focus on practical implementations such as temperature control or collision avoidance between robots to validate the algorithm. Alternatively, you can work extensions to contextual games with unknown constraints.

Requirements:

We seek for motivated students with a strong mathematical, or computer science background. We do have some concrete ideas on how to tackle the above challenges, but we are always open to other suggestions. If you are interested, please send an email containing: 1. One paragraph on your background and fit for the project and 2. Your BS and MS transcripts to anna.maddux@epfl.ch. This project will be supervised by Prof. Maryam Kamgarpour (maryam.kamgarpour@epfl.ch) and Anna Maddux (anna.maddux@epfl.ch).

Relevant literature:

The following papers will be relevant for the master’s thesis: [1–4].

References

[1] Avrim Blum and Yishay Mansour. “From external to internal regret.” In: Journal of Machine Learning Research (2007).

[2] Yoav Freund et al. “Using and combining predictors that specialize”. In: Proceedings of the twenty-ninth annual ACM symposium on Theory of computing. 1997.

[3] Anna M Maddux and Maryam Kamgarpour. “Multi-Agent Learning in Contextual Games under Unknown Constraints”. In: International Conference on Artificial Intelligence and Statistics. PMLR. 2024. [4] Tim Roughgarden. Twenty lectures on algorithmic game theory. Cambridge University Press, 2016. [5] Yannick Viossat and Andriy Zapechelnyuk. “No-regret dynamics and fictitious play”. In: Journal of Economic Theory (2013).