We discuss the use of optimal transport techniques to derive finite-time error bounds
for reinforcement learning in mean-payoff Markov decision processes. The results are
obtained as a special case of stochastic Krasnoselski—Mann fixed point iterations for
nonexpansive maps. We present sufficient conditions on the stochastic noise and
stepsizes that guarantee almost sure convergence of the iterates towards a fixed
point, as well as non-asymptotic error bounds and convergence rates. Our main
results concern the case of a martingale difference noise with variances that can
possibly grow unbounded. We also analyze the case of uniformly bounded variances,
and how they apply for Stochastic Gradient Descent in convex optimization.