In this talk, we explain why the training of Residual Networks (ResNets) is relatively easy in two limiting cases. The first one is infinite depth and linear parametrization of the residual blocks. The second is infinite depth and infinite width.
For the second case, we introduce the conditional Wasserstein distance which naturally appears as the metric structure to train this limiting model, which encompasses the infinite depth and finite width setting. The main technical result is to prove a local Polyak-Lojasiewicz inequality in the first case and the existence of the flow in the second case.