EE-618 Theory and Methods for Reinforcement Learning ‒ LIONS ‐ EPFL

Summary

This course describes theory and methods for decision making under uncertainty under partial feedback.

Content

Lecture 1
Introduction to Reinforcement Learning
Definition of the Markov decision process formalism, value functions, policies, and peformance criteria. Brief overview of the forthcoming material: dynamic programming, linear programming approach, policy gradients, deep reinforcement learning, Markov games, and robust reinforcement learning.

Lecture 2
Dynamic Programming
Policy evaluation and Bellman consistency equation. Policy optimization and Bellman optimality. Dynamic programming with known and unknown transition dynamics: Value Iteration, Policy Iteration, Monte-Carlo methods, Temporal Difference Learning.

Lecture 3
Linear Programming
Algorithms based on the primal and dual linear programming approach to Markov decision processes: Linear function approximation and constraint sampling, randomized primal-dual methods, Relative Entropy Policy Search, applications to offline reinforcement learning

Lecture 4
Policy Gradient 1
Policy parameterization and policy gradient theorem. REINFORCE algorithm. Computation of unbiased estimators for the policy gradient.
Non concavity and gradient dominance property of the policy gradient objective. Global convergence of Projected Gradient Descent. Natural Policy Gradient.

Lecture 5
Policy Gradient 2
Model free implementation of NPG.
Trust Region Policy Optimization. Proximal Policy Optimization.
Exploration techniques in Policy Gradient.
Baselines choice.

Lecture 6
Imitation Learning
Motivation and problem formulation. Behavior Cloning. Online Imitation Learning. Inverse Reinforcement learning. Apprenticeship Learning formalism. Maximum Causal Entropy Inverse Reinforcement Learning. Generative Adversarial Imitation Learning. Lagrandian duality methods.

Lecture 7
Markov Games
Problem setup and different notions of equilibria: dominant strategies, Nash equilibria.
Overview of games dynamics: Iterated Best Response, Fictitious Play, Gradient Descent Ascent.
Definition of Value functions and Nash equilibrium in 2 players Markov Games.
Policy Gradient, nonlinear programming and value iteration for zero-sum Markov games.

Lecture 8
Deep Reinforcement Learning and Robust Reinforcement Learning
Deep value based RL algorithms (DQN and variants) Importance of robustness in reinforcement learning. Robust reinforcement learning as a zero-sum Markov game. Robustness in Imitation Learning.

Keywords

Reinforcement learning, policy search.

Learning Prerequisites

Required courses:

Optimization, probability theory, mathematics of data.

Assessment methods

Project report.