Description:
The primary goal of this project is to develop a model-free algorithm for risk-sensitive inverse reinforcement learning (IRL), where the expert’s behavior is driven by optimizing a risk measure rather than the expected discounted reward [1]. While the approach proposed in [1] requires access to the transition law, our objective is to create an algorithm with provable convergence guarantees that does not depend on this requirement. To this end, a promising direction could be to devise a gradient descent-ascent approach to address the min-max problem inherent to IRL [2] (the min-max problem arises since we aim to find a reward that minimizes the suboptimality of the expert). By leveraging recent guarantees for risk-sensitive policy gradients [3], our aim is then to analyze the convergence in terms of the number of samples needed for the algorithm to achieve convergence.
References:
- Majumdar, Anirudha, et al. “Risk-sensitive Inverse Reinforcement Learning via Coherent Risk Models.” Robotics: science and systems. Vol. 16. 2017.
- Ho, Jonathan, and Stefano Ermon. “Generative adversarial imitation learning.” Advances in neural information processing systems 29 (2016).
- Yu, Xian, and Lei Ying. “On the global convergence of risk-averse policy gradient methods with expected conditional risk measures.” International Conference on Machine Learning. PMLR, 2023.
Supervisors:
Tingting Ni ([email protected]), Andreas Schlaginhaufen ([email protected])