The Long-Run Distribution of Stochastic Gradient Descent: A Large Deviations Analysis

Panayotis Mertikopoulos (CNRS, Grenoble)

Jun 03. 2024, 11:30 — 12:00

This talk investigates the long-run state distribution of stochastic gradient descent (SGD) in general, non-convex problems - namely, which regions of the problem's state space are more likely to be visited by SGD, and by how much. Using an approach based on the theory of large deviations and randomly perturbed dynamical systems, we show that the long-run distribution of SGD resembles the Boltzmann-Gibbs distribution of equilibrium thermodynamics with temperature equal to the method's step-size and energy levels determined by the problem's objective and the statistics of the noise. In particular, we show that, in the long run (i) the problem's critical region is visited exponentially more often than any non-critical region; (ii) the iterates of SGD are exponentially concentrated around the problem's minimum energy state (which does not always coincide with the global minimum of the objective); (iii) all other components of critical points are visited with frequency that is exponentially proportional to their energy level; and, finally, (iv) every non-minimizing component is "dominated" by a minimizing component that is visited exponentially more often.

Further Information

Venue:: ESI Boltzmann Lecture Hall
Recordings:: Recording
Associated Event:: One World Optimization Seminar in Vienna (Workshop)
Organizer(s):: Radu Ioan Bot (U of Vienna)
Yurii Malitskyi (U of Vienna)