Abstract: This semester project addresses the challenge of building a structure connecting two supports across a gap, using reinforcement learning from human preferences. This approach involves learning a reward predictor from human feedback between pairs of demonstrations of the construction task. After presenting the algorithm used to train the agent with human feedback, the report begins by experimentally validating the methodology. A comparison between the effectiveness of two reward models follows: one based on a linear combination of handcrafted features and another on a convolutional neural network. Subsequently, the report assesses the impact of a query selection strategy based on the disagreement among an ensemble of reward predictors. The report concludes with tests comparing an agent trained with a reward derived from human preferences with a benchmark forward reinforcement learning agent, demonstrating the promise of the proposed reward shaping strategy.
Project report: Report Sabri Final.
Credits: This is a semester project report by Sabri El Amrani who was supervised by Maryam Kamgarpour, Andreas Schlaginhauffen, and Anna Maddux.