Markov Decision Processes of the Third Kind: Learning Distributions by Policy Gradient Descent

Nicole Bäuerle (KIT, Karlsruhe)

Feb 10. 2026, 10:35 — 11:10

In this talk we analyze distributional Markov Decision Processes as a class of control problems in which the objective is to learn policies that steer the distribution of a cumulative reward toward a prescribed target law, rather than optimizing an expected value or a risk functional. To solve the resulting distributional control problem in a model-free setting, we propose a policy-gradient algorithm based on neural-network parameterizations of the randomized Markov policies, defined on an augmented state space and a sample-based evaluation of the characteristic-function loss. Under mild regularity and growth assumptions, we prove convergence of the algorithm to stationary points using stochastic approximation techniques. Several numerical experiments illustrate the ability of the method to match complex target distributions.

Based on joint works with Tamara Göll, Anna Jaskiewicz and Athanasios Vasileiadis

Further Information

Venue:: ESI Boltzmann Lecture Hall
Associated Event:: Probabilistic Mass Transport - from Schrödinger to Stochastic Analysis (Workshop)
Organizer(s):: Beatrice Acciaio (ETH Zurich)
Julio Backhoff (U of Vienna)
Daniel Bartl (U of Vienna)
Mathias Beiglböck (U of Vienna)
Sigrid Källblad (KTH Stockholm)
Walter Schachermayer (U of Vienna)