Reinforcement learning (RL) addresses finding a policy for a Markov decision process (MDP) to optimise a cumulative reward function based on the observations of the rewards and the evolution of the MDP.
Two major challenges when applying RL in the real world are ensuring safety of the learned policy and designing an appropriate reward function for the problem at hand. While a surge of recent work addresses
safe RL, past work has focused on the case in which the reward function is known. In many cases such as autonomous driving, human-robot interactions, or question answering, the reward functions are a priori
unknown and hard to design. Inverse reinforcement learning (IRL) addresses learning reward functions based on expert demonstrations. This is particularly promising in areas such as autonomous driving or human robot interaction, where human expert data is readily available. In a previous work, we developed a framework for safe inverse reinforcement learning and validated our approach in a simple gridworld simulation. The
objective of this project is to extend these algorithms to higher dimensional settings, and to implement them on our own testbed consisting of mobile robots, a motion capture system, and an integrated projector for simulating dynamic environments (see Figure above). To this end, you will first familiarize yourself with the corresponding literature and the existing deep RL code from a previous master project in our lab.
Extending upon existing work (and code), you will then develop an algorithm for safe IRL in continuous state and action spaces and implement it on the robotic testbed. Finally, you will setup an experiment to
validate the performance of your method (in terms of safety, sample efficiency, and generalizability) against state of the art unconstrained IRL algorithms.
This project requires a strong background in reinforcement learning and in optimisation theory, and it provides a valuable opportunity to enhance your expertise in both of these areas. The project poses several
challenges, including the requirement for function approximation due to the continuous state and action spaces, as well as, sim-to-real transfer since training needs to be carried out in simulation (due to the
high sample complexity of RL). Although the focus of this project is on the practical application, there are numerous compelling theoretical aspects to explore, e.g. related to safety, generalization, and function approximation.
This project will be supervised by Prof. Maryam Kamgarpour and Andreas Schlaginhaufen (andreas.schlaginhaufen@epfl.ch).
If you are interested, please send an email containing
- one paragraph on your background and fit for the project
- your BS and MS transcripts to andreas.schlaginhaufen@epfl.ch.
The students who have suitable track record will be contacted.