Some Better Theory for Gradient Descent

Benjamin Grimmer (Johns Hopkins U, Baltimore)

Jun 05. 2024, 14:30 — 15:00

This talk will discuss recent surprising advances in the best-known theory for gradient descent applied to an L-smooth convex function. These advances all rely on computer-assisted performance estimation. By using stepsizes larger than the classic regime where descent can be ensured, strictly faster guarantees than the classic O(1/N) convergence rates can be achieved. We will present the best-known stepsizes schedules, which are similar to but differ slightly from the concurrently developed fractal silver stepsizes of Altschuler and Parrilo. Using the recursive gluing analysis technique of Altschuler and Parrilo, we will show a O(1/N^1.2716…) convergence rate in terms of objective gap decrease and in terms of squared-gradient-norm decrease. This first result improves on Altschuler and Parrilo by a constant factor, while the second results improve on the exponent of the prior best squared-gradient-norm convergence guarantee of O(1/N). These rates are tight, supported by simple matching problem instances. Additional results on the effective of averaging and extrapolation will be presented.

Further Information
Venue:
ESI Boltzmann Lecture Hall
Associated Event:
One World Optimization Seminar in Vienna (Workshop)
Organizer(s):
Radu Ioan Bot (U of Vienna)
Yurii Malitskyi (U of Vienna)