Materia
Introducción al aprendizaje por refuerzo
Datos generales de la materia
- Modalidad
- Presencial
- Idioma
- Inglés
Descripción y contextualización de la asignatura
Reinforcement learning (RL) is a body of theory and techniques for optimal sequential decision making. In its basic setting, at each time, an agent selects an action, and as a result, it collects a reward and the system state evolves. The agent observes the new state and decides on the next action, with the objective of maximizing the total accumulated reward. Reinforcement learning has found numerous applications, ranging from online services (ad placement, recommendation systems), game playing (chess, Atari, Go etc.), control, robotics, etc. In this course we will first introduce the underlying mathematical framework (Markov decision processes) and its solution methods, including dynamic programming, Monte Carlo methods, and temporal-difference learningProfesorado
Nombre | Institución | Categoría | Doctor/a | Perfil docente | Área | |
---|---|---|---|---|---|---|
AYESTA MORATE, URTZI | Universidad del País Vasco/Euskal Herriko Unibertsitatea | Visitante Ikerbaske | Doctor | No bilingüe | Ciencia de la Computación e Inteligencia Artificial | urtzi.ayesta@ehu.eus |
Competencias
Denominación | Peso |
---|---|
Conocimiento de los principios teóricos del aprendizaje por refuerzo. | 50.0 % |
Desarrollar algoritmos de aprendizaje por refuerzo adaptados a problemas específicos. | 50.0 % |
Tipos de docencia
Tipo | Horas presenciales | Horas no presenciales | Horas totales |
---|---|---|---|
Magistral | 15 | 0 | 15 |
P. Ordenador | 15 | 45 | 60 |
Actividades formativas
Denominación | Horas | Porcentaje de presencialidad |
---|---|---|
Clases magistrales | 15.0 | 100 % |
Trabajo en grupo | 45.0 | 0 % |
Trabajos con equipos informáticos | 15.0 | 100 % |
Sistemas de evaluación
Denominación | Ponderación mínima | Ponderación máxima |
---|---|---|
Ensayo, trabajo individual y/o en grupo | 25.0 % | 50.0 % |
Examen escrito | 50.0 % | 75.0 % |
Resultados del aprendizaje de la asignatura
- Understand the basics of sequential decisión making.- Formulate RL algorithms that can solve optimally a sequential decisión problem.
- Gain a mathematical understanding of convergence results of RL algorithms.
- Learn how RL can be combined with parametric function approxi-mation, including deep learning, to find good approximate solutions to real world complexity problems.
Temario
Introduction to Reinforcement LearningTopics: Applications of RL, RL successes, RL vs. supervised learning, major components of RL, Learning and planning, prediction vs. control
Recap of Markov Processes
Topics: Markov Chains, Markov Reward Processes, Markov Decision Processes
Stochastic dynamic programming
Topics: Principle of optimality, dynamic programming, Bellman optimality equation, Value function, Iterative schemes to solve Bellman (Value Iteration, Policy iteration)
Prediction: How to learn the performance?
Topics: Monte Carlo, Temporal-Difference Learning (TD(0), TD(¿)))
Control: How to learn the optimal control?
Topics: State-action function, Online and Offline learning, Exact methods (Q- learning, and SARSA)
Convergence of learning algorithms
Topics: Convergence of random variables, Martingales, stochastic approximation, Convergence of TD(0), TD(¿) and Q-learning
Exploration vs. exploitation
Topics: Multi-armed bandits, optimality of index policies (Gittins), Regret, Optimality of logarithmic regret, UCB Algo-rithm
Approximate Solution Methods
Topics: Value function approximation, Stochastic gradient descent, approximation by feature representation, linear value function approximation (convergence), control with value function approximation, Action-value function ap-proximation, deep reinforcement learning, batch reinforcement learning, experience replay, Algorithms wit supra-human performance (AlphaGo, Atari games)
Bibliografía
Bibliografía básica
Sean Meyn, Feedback systems and reinforcement learning, 2020Dimitri P. Bertsekas, Reinforcement learning and optimal control, 2019
M. L. Puterman, Markov Decision Processes. Wiley, 1994.
Richard S. Sutton and Andrew G. Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA, 2018.
V. Borkar, Stochastic approximation: a dynamical systems viewpoint, Tata institute of Fun- damental Research, 2008