A Hessian-Aware Stochastic Differential Equation Modelling of SGD

Niao He (ETH Zurich)

Jun 05. 2024, 10:00 — 10:30

Continuous-time approximation of Stochastic Gradient Descent (SGD) is a crucial tool to study its escaping behaviors from stationary points. However, existing stochastic differential equation (SDE) models fail to fully capture these escaping behaviors, even for simple quadratic objectives. Built on a novel stochastic backward error analysis framework, we derive the Hessian-Aware Stochastic Modified Equation (HA-SME), an SDE that incorporates Hessian information of the objective function into both its drift and diffusion terms. Our analysis shows that HA-SME matches the order-best approximation error guarantee among existing SDE models in literature, while achieving a significantly reduced dependence on the smoothness parameter. Further, for quadratics objectives, under mild conditions, HA-SME is proved to be the first SDE model that recovers exactly the SGD dynamics in the distributional sense. 

Further Information
Venue:
ESI Boltzmann Lecture Hall
Recordings:
Recording
Associated Event:
One World Optimization Seminar in Vienna (Workshop)
Organizer(s):
Radu Ioan Bot (U of Vienna)
Yurii Malitskyi (U of Vienna)