Logistic regression explicitly maximizes margins;
should we stop training early?
Matus Telgarsky, University of Illinois
This talk will present two perspectives on the behavior of gradient descent with the logistic loss: on the one hand, it seems we should run as long as possible, and achieve good margins; on the other, stopping early seems necessary for noisy problems. In the first part, focused on the linear case, a new perspective of explicit bias (rather than implicit bias) yields new analyses and algorithms with margin maximization rate as fast as 1/t^2 (whereas prior work had 1/sqrt{t} at best). The second part, focused on shallow ReLU networks, argues that the margin bias might fail to be ideal, and that stopping early can achieve consistency and calibration for arbitrary classification problems. Moreover, this early phase is still adaptive to data simplicity, but with a different bias than the margin bias.
Joint work with Ziwei Ji, Justin D. Li, Nati Srebro.