Reward learning from human feedback for the Construction of Spanning Structures ‒ UPKAMGARPOUR ‐ EPFL

Abstract: This semester project addresses the challenge of building a structure connecting two supports across a gap, using reinforcement learning from human preferences. This approach involves learning a reward predictor from human feedback between pairs of demonstrations of the construction task. After presenting the algorithm used to train the agent with human feedback, the report begins by experimentally validating the methodology. A comparison between the effectiveness of two reward models follows: one based on a linear combination of handcrafted features and another on a convolutional neural network. Subsequently, the report assesses the impact of a query selection strategy based on the disagreement among an ensemble of reward predictors. The report concludes with tests comparing an agent trained with a reward derived from human preferences with a benchmark forward reinforcement learning agent, demonstrating the promise of the proposed reward shaping strategy.

Project report: Report Sabri Final.

Video caption: This research focuses on reinforcement learning from human feedback for the construction of spanning structures. In this framework, individuals are presented with two demonstrations of the task and asked to choose their preferred option, as shown above. This human feedback orients the learning of a reward function, guiding a reinforcement learning agent toward better construction strategies. By incorporating human preferences into the learning process, we aim to build more efficient and stable bridges.

Credits: This is a semester project report by Sabri El Amrani who was supervised by Maryam Kamgarpour, Andreas Schlaginhauffen, and Anna Maddux.