Error bounds for mean-payoff Markov decision processes

Roberto Cominetti (U Adolfo Ibanez, Santiago de Chile)

Jun 03. 2024, 14:00 — 14:30

We discuss the use of optimal transport techniques to derive finite-time error bounds
for reinforcement learning in mean-payoff Markov decision processes. The results are
obtained as a special case of stochastic Krasnoselski—Mann fixed point iterations for 
nonexpansive maps. We present sufficient conditions on the stochastic noise and 
stepsizes that guarantee almost sure convergence of the iterates towards a fixed 
point, as well as non-asymptotic error bounds and convergence rates. Our main 
results concern the case of a martingale difference noise with variances that can 
possibly grow unbounded. We also analyze the case of uniformly bounded variances, 
and how they apply for Stochastic Gradient Descent in convex optimization.

Further Information
Venue:
ESI Boltzmann Lecture Hall
Recordings:
Recording
Associated Event:
One World Optimization Seminar in Vienna (Workshop)
Organizer(s):
Radu Ioan Bot (U of Vienna)
Yurii Malitskyi (U of Vienna)