When you access EPFL websites, we may set cookies on your devices and process personal data about you in accordance with our privacy policy. You can block cookies by using your browser settings.
MDPs; value and Q-functions; value iteration, policy iteration; operator perspectives. Model-free policy-based and value-based methods; connections to gradient methods; Monte Carlo (MC) method and temporal difference (TD) learning.