Imitation Learning / Inverse Reinforcement Learning for Dexterous Robotic Manipulation (Master Thesis)

Outline

The goal of this project is to learn control policies for robotic hand manipulation tasks from a set of human expert demonstrations using inverse reinforcement learning and the IsaacSim simulation environment.

Motivation

Dexterous robotic manipulation tasks, like grasping different objects, are challenging control tasks due to the many degrees of freedom of a robotic hand and the variability of materials involved. Reinforcement learning with GPU-accelerated simulators like IsaacSim offers a promising solution, but sparse reward functions (e.g., only rewarding task completion) hinder efficient optimization. Hand-crafting dense rewards is difficult and often leads to suboptimal behavior. On the other hand, learning from human expert demonstrations via imitation learning or inverse reinforcement learning (IRL) can help to effectively guide policy optimization. While offline imitation learning approaches, such as behavioral cloning, are simple to implement, they often suffer from poor generalization due to covariate shift. The key problem is that the imitating policy performs poorly on states that were not encountered in the training data. In contrast, IRL allows to effectively incorporate new samples collected in simulation, which leads to improved generalizability. Moreover, IRL enables learning from state-only observations, potentially eliminating the need for teleoperation and simplifying the process of collecting expert data.

Milestones

  • Simulation for robot hand based on IsaacSim.
  • Extract human hand pose using a VR headset, and collect demonstration data.
  • Develop imitation learning algorithm to learn a policy for the manipulation task.
  • Add domain randomization to simulation, if necessary.
  • Implement whole pipeline in real-world tests using a robot arm and multi-finger robot
    hand.
  • Combine imitation learning with RL to improve upon experts’ performance.

Requirements

We look for motivated students with a strong background in machine learning and coding. Furthermore, experience with ROS and robotic simulation is a plus.

References:

[1] Pan, Cheng, Kai Junge, and Josie Hughes. “Vision-Language-Action Model and Diffusion Policy Switching Enables Dexterous Control of an Anthropomorphic Hand.” arXiv preprint arXiv:2410.14022 (2024). https://vla-diffu-switch.github.io/
[2] Eze, Chrisantus, and Christopher Crick. “Learning by Watching: A Review of Video-based Learning Approaches for Robot Manipulation.” arXiv preprint arXiv:2402.07127 (2024).
[3] Hejna, Joey, and Dorsa Sadigh. “Inverse preference learning: Preference-based rl without a reward function.” Advances in Neural Information Processing Systems 36 (2024).
[4] Goecks, Vinicius G., et al. “Integrating behavior cloning and reinforcement learning for im-proved performance in dense and sparse reward environments.” arXiv preprint arXiv:1910.04281 (2019).

This project will be supervised by Prof. Josie Hughes, Prof. Maryam Kamgarpour, Cheng Pan (cheng.pan@epfl.ch), and Andreas Schlaginhaufen (andreas.schlaginhaufen@epfl.ch).

Contact: andreas.schlaginhaufen@epfl.ch, cheng.pan@epfl.ch,