Publications ‒ TML ‐ EPFL

Warning

Please note that the publication lists from Infoscience integrated into the EPFL website, lab or people pages are frozen following the launch of the new version of platform. The owners of these pages are invited to recreate their publication list from Infoscience. For any assistance, please consult the Infoscience help or contact support.

Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

H. G. Papazov; S. Pesme; N. Flammarion

2024-03-10. Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AIS- TATS) 2024, , Valencia, Spain, May 2-4, 2024.

Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

Deep Learning Theory Through the Lens of Diagonal Linear Networks

Understanding generalization and robustness in modern deep learning

Scalable constrained optimization

Saddle-to-Saddle Dynamics in Diagonal Linear Networks

Model agnostic methods meta-learn despite misspecifications

Penalising the biases in norm regularisation enforces sparsity

(S)GD over Diagonal Linear Networks: Implicit Regularisation, Large Stepsizes and Edge of Stability

Accelerated SGD for Non-Strongly-Convex Least Squares

An Efficient Sampling Algorithm for Non-smooth Composite Potentials

Towards Understanding Sharpness-Aware Minimization

Sparse-RS: A Versatile Framework for Query-Efficient Sparse Black-Box Adversarial Attacks

Gradient flow dynamics of shallow ReLU networks for square loss and orthogonal inputs

Utility/privacy trade-off as regularized optimal transport

Trace norm regularization for multi-task learning with scarce data

Improved bounds for discretization of Langevin diffusions: Near-optimal rates without convexity

Last iterate convergence of SGD for Least-Squares in the Interpolation regime

Is there an analog of Nesterov acceleration for gradient-based MCMC?

Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of Stochasticity

On the effectiveness of adversarial training against common corruptions

A Continuized View on Nesterov Acceleration

On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines

RobustBench: a standardized adversarial robustness benchmark

Understanding and Improving Fast Adversarial Training

Sparse-RS: a versatile framework for query-efficient sparse black-box adversarial attacks

Online Robust Regression via SGD on the l1 loss

On Convergence-Diagnostic based Step Sizes for Stochastic Gradient Descent

Square Attack: a query-efficient black-box adversarial attack via random search

Provably Robust Boosted Decision Stumps and Trees against Adversarial Attacks

Escaping from saddle points on Riemannian manifolds

Is There an Analog of Nesterov Acceleration for MCMC?

Fast Mean Estimation with Sub-Gaussian Rates

Optimal rates of statistical seriation

Sampling can be faster than optimization

Gen-Oja: A Simple and Efficient Algorithm for Streaming Generalized Eigenvector Computation

Averaging Stochastic Gradient Descent on Riemannian Manifolds

On the theory of variance reduction for stochastic gradient monte carlo

Stochastic Composite Least-Squares Regression with Convergence Rate O(1/n)

Robust Discriminative Clustering with Sparse Regularizers

Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression

From averaging to acceleration, there is only a step-size