Slides 2024

Outline

The 2024 course consists of the following topics

Introduction

MDPs; value and Q-functions; value iteration, policy iteration; operator perspectives. Model-free policy-based and value-based methods; connections to gradient methods; Monte Carlo (MC) method and temporal difference (TD) learning.

Primal and Dual LP, primal-dual methods, REPS.r algebra reminder

Policy parameterizations, policy gradient theorems and estimators, performance difference lemma, gradient dominance and convergence of policy gradient methods, narual policy gradient

NPG, sample-based NPG, TRPO, exploration in policy gradients

Behavioral cloning, dagger, MCE-IRL, GAIL, P2IL, IQ-Learn

NFG, equilibria, response dynamics of iterated play, Markov games, RL dynamics in Markov games

Actor Critic based Deep RL: TRPO, Soft Actor Critic.Value based Deep RL: DQN, Double DQN, Rainbow.Robust RL and IRL.