Model-free risk-sensitive inverse reinforcement learning ‒ UPKAMGARPOUR ‐ EPFL

Description:

The primary goal of this project is to develop a model-free algorithm for risk-sensitive inverse reinforcement learning (IRL), where the expert’s behavior is driven by optimizing a risk measure rather than the expected discounted reward [1]. While the approach proposed in [1] requires access to the transition law, our objective is to create an algorithm with provable convergence guarantees that does not depend on this requirement. To this end, a promising direction could be to devise a gradient descent-ascent approach to address the min-max problem inherent to IRL [2] (the min-max problem arises since we aim to find a reward that minimizes the suboptimality of the expert). By leveraging recent guarantees for risk-sensitive policy gradients [3], our aim is then to analyze the convergence in terms of the number of samples needed for the algorithm to achieve convergence.

References:

Majumdar, Anirudha, et al. “Risk-sensitive Inverse Reinforcement Learning via Coherent Risk Models.” Robotics: science and systems. Vol. 16. 2017.
Ho, Jonathan, and Stefano Ermon. “Generative adversarial imitation learning.” Advances in neural information processing systems 29 (2016).
Yu, Xian, and Lei Ying. “On the global convergence of risk-averse policy gradient methods with expected conditional risk measures.” International Conference on Machine Learning. PMLR, 2023.

Supervisors:

Tingting Ni (tingting.ni@epfl.ch), Andreas Schlaginhaufen (andreas.schlaginhaufen@epfl.ch)