Automating Reward Function Design Using LLMs for Robotic Disassembly of EV Batteries ‒ UPKAMGARPOUR ‐ EPFL

Background and Motivation
The rapid growth of electric vehicles (EVs) is leading to a surge in the number of end-of-life (EOL) batteries, which require safe and efficient recycling processes. Disassembling battery packs is a complex and hazardous task, where robots can offer advantages in terms of precision, safety, and repeatability. However, existing robotic systems face challenges in adapting to the variability of battery pack designs.At the Swiss Battery Technology Centre (SBTC), a reinforcement learning (RL) methodology is currently employed to train robots for various disassembly tasks, such as unscrewing components, using Nvidia Isaac Sim for simulation. The RL pipeline involves training robots to maximize cumulative rewards for effective disassembly in a simulated environment, followed by transferring these skills to real-world robotic setups.A key challenge in this process is reward function design: for manipulation tasks, ground-truth rewards are often sparse (e.g., only rewarding task completion), leading to inefficient optimization. Manually crafting denser reward functions is time-consuming and frequently results in suboptimal learning. Furthermore, while helpful, automated reward learning methods rely heavily on costly human inputs, such as expert demonstrations or preference data. Recently, foundation models like large language models (LLMs) have shown promise in automating and enhancing reward function design, offering a cheaper and more scalable solution [1,2,3]. The goal of this project is to explore the potential of LLMs for iteratively improving reward functions in complex disassembly tasks.

Methodology
1.    Integration with Existing RL Pipeline
Integrate LLMs into SBTC’s RL pipeline to automate the generation and refinement of reward functions for tasks such as unscrewing EV battery components.
2.    Iterative Reward Function Optimization
Utilize LLMs to iteratively improve reward functions based on feedback from simulation results. The LLM will analyze task outcomes and adjust the reward design to better align with desired behaviors, enhancing the robot’s performance.
3.    Simulation-Based Training
Train robotic agents using the optimized reward functions within Nvidia Isaac Sim to evaluate improvements in learning speed and task efficiency.
4.    Real-World Validation
Transfer the trained skills to real-world robotic setups. Assess the performance gains in disassembly tasks, focusing on precision, speed, and robustness.

Expected outcomes
1.    Automated RL training pipeline for developing new robotic manipulation skills with minimal human intervention.
2.    Enhanced training efficiency due to optimized reward function design, leading to faster convergence and better generalization in real-world disassembly tasks.
3.    Insights into using LLMs for continuous learning and adaptation in robotic systems, potentially extending beyond disassembly to other automation applications.

Requirements
We look for motivated students with a strong background in machine learning and coding. We do have concrete ideas on how to tackle the above challenges, but we are always open for different suggestions. If you are interested, please send an email containing 1. one paragraph on your background and fit for the project, 2. your BS and MS transcripts to raphael.raetz@sipbb.ch, oezhan.oezen@sipbb.ch, and andreas.schlaginhaufen@epfl.ch.

References

Ma, Yecheng Jason, et al. “Eureka: Human-level reward design via coding large language models.” arXiv preprint arXiv:2310.12931 (2023).
Ma, Yecheng Jason, et al. “DrEureka: Language Model Guided Sim-To-Real Transfer.” arXiv preprint arXiv:2406.01967 (2024).
Sun, Yuan, et al. “Optimizing Autonomous Driving for Safety: A Human-Centric Approach with LLM-Enhanced RLHF.” Companion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing. 2024.