Machine Learning and Optimization Laboratory ‐ EPFL

Welcome to the Machine Learning and Optimization Laboratory at EPFL! Here you find some info about us, our research, teaching, as well as available student projects and open positions.

Links: our github

NEWS

Papers at ICLR and AIStats
2025/01/23: Some papers of our group at the two upcoming conferences:
- CoTFormer: A Chain of Thought Driven Architecture with Budget-Adaptive Computation Cost at Inference
- The AdEMAMix Optimizer: Better, Faster, Older
- Task-Adaptive Pretrained Language Models via Clustered-Importance Sampling
- Effective Interplay between Sparsity and Quantization: From Theory to Practice (spotlight)
- Intrinsic User-Centric Interpretability through Global Mixture of Experts
- Attention with Markov: A Curious Case of Single-layer Transformers (spotlight)
- Improving Stochastic Cubic Newton with Momentum
- Cubic regularized subspace Newton for non-convex optimization
European PhD award for Anastasiia
2024/12/12: Anastasiia Koloskova won the European AI PhD Award by Ellis, for her thesis titled Optimization Algorithms for Decentralized, Distributed and Collaborative Machine Learning.
Fineweb-2 multilingual
2024/12/08: In a great collaboration with the HuggingFace research team, we released Fineweb-2, the currently leading multilingual pre-training dataset for LLMs, supporting over 2000 languages.
Papers at NeurIPS
2024/11/05: Several papers of our group were accepted at this year’s NeurIPS conference:
- DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
- Scaling Laws and Compute-Optimal Training without Fixed Training Duration
- Analyzing & Reducing the Need for Learning Rate Warmup in Neural Network Optimization
- CoBo: Collaborative Learning via Bilevel Optimization
- QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs
Praneeth joining USC
2024/06/09: Sai Praneeth Karimireddy, the first PhD graduate of our lab, will be joining University of Southern California as a new professor, and is hiring a new team. Congratulations!!!
Papers at ICML
2024/05/04: Our group is presenting the following papers at the upcoming ICML conference:
- Spectral Preconditioning for Gradient Methods on Graded Non-convex Functions
- DOGE: Domain Reweighting with Generalization Estimation
- Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks
- On Convergence of Incremental Gradient for Non-convex Smooth Functions
- LASER: Linear Compression in Wireless Distributed Optimization
- The Privacy Power of Correlated Noise in Decentralized Learning
Hackathon on LLM pretraining
2024/04/20: Lauzhack and our lab hosted an exciting hackathon on LLM training, aiming to improve current models in speed and learning performance. Big congrats to the winners Harold Benoit, Maël Imhof, Adrien Gunther, Stefan Peters, Ahmad Rahimi and Yasamin Borhani.
Meditron
2023/11/28: We released Meditron, the currently best performing open source LLM for medical applications.
PhD defenses
2023/10/03: Anastasia Koloskova and Lie He have successfully defended their PhD theses 1,2.
Papers at NeurIPS
2023/10/29: Several papers of our group were accepted at this year’s NeurIPS conference, on long-context LLMs, efficient attention, multimodal learning, multiplication-free training, and collaborative learning. In addition, we present workshop papers on curriculum learning on domain and datapoint level, LLM depth vs sequence length, and wireless decentralized learning.
LLM distributed trainer
2023/07/30: We released the Megatron-LLM distributed trainer software, with support for Llama 1&2 and Falcon larger scale models. (related Tweet).
Papers at ICML
2023/04/25: At the upcoming ICML 2023 conference, we’ll present two papers on Second-order optimization with lazy Hessians (oral presentation) and Special Properties of Gradient Descent with Large Learning Rates, as well as several workshop papers on LLMs 1,2 and mode connectivity.
PhD defense
2023/04/11: Thijs Vogels has successfully defended his PhD thesis, on Communication-efficient distributed training of machine learning models. He will be joining Microsoft Research AI4Science. Congratulations!!!
PhD defense
2023/02/27: Jean-Baptiste Cordonnier has successfully defended his PhD thesis, on Transformer Models for Vision. He is joining Inceptive. Congratulations!!!
Paper at ICLR
2023/01/20: Our work on Agree to Disagree: Diversity through Disagreement for Better Transferability was accepted as a notable-top-5% paper at ICLR 2023.
Disco software release
2022/11/04: We’re releasing v2.0 of our open source software for distributed collaborative learning. Fully capable to run in web browsers and apps, supporting all use cases for decentralized and federated privacy-preserving ML.
Papers at NeurIPS
2022/10/15: Some papers of our group were accepted at this year’s NeurIPS conference, on Asynchronous SGD (oral presentation), and Beyond spectral gap in decentralized learning, the dataset paper FLamby for Cross-Silo Federated Learning, as well as several workshop contributions.
Machine learning course starting
2022/09/20: The Machine Learning course CS-433 again starts with 550 students inscribed. All materials are publicly available on github. Please join our call for interdisciplinary project ideas.
PhD defense
2022/07/18: Tao Lin has successfully defended his PhD thesis, on Algorithms for Efficient and Robust Distributed Deep Learning. He is joining Westlake University as a new Prof, and hiring people. Congratulations!!!
Papers at ICLR and AISTATS
2022/01/29: Our work on Byzantine robust training was accepted as a spotlight at ICLR 2022, in addition to another paper on knowledge distillation in federated learning at ICLR, and on partial training of neural networks at AISTATS 2022.
PhD defense on NLP
2021/11/26: Prakhar Gupta has successfully defended his PhD, on Learning computationally efficient static word and sentence representations. Congratulations!!!
NeurIPS best paper award for Hadrien & Nicolas’ team
2021/12/02: Hadrien Hendrikx, postdoc at MLO, has won a NeurIPS outstanding paper award for their work on a continuized version of Nesterov acceleration (link to TML lab).
Thesis award for Praneeth
2021/11/09: Praneeth’s PhD thesis on Collaborative Learning has won the 2021 Chorafas Award highlighting innovative research with real-world impact.
Google PhD Fellowship
2021/10/01: Anastasia Koloskova from our lab has received this year’s Google PhD Fellowship in machine learning.
Papers at NeurIPS
2021/09/28: Some papers of our group were accepted at this year’s NeurIPS conference, on RelaySGD for decentralized training, and MIME for federated learning, and gradient tracking, as well as several workshop contributions. Hope to see you at NeurIPS!
Machine learning course starting
2021/09/21: The Machine Learning course CS-433 again starts with 560 students inscribed. All materials are publicly available on github. Please join our call for interdisciplinary project ideas.
First PhD defense in our lab
2021/08/30: Sai Praneeth Karimireddy has successfully defended his PhD, on the topic of Optimization Algorithms for Collaborative Learning. Congratulations!!!
Successful master students
2021/07/25: Felix Grimberg, master student in our lab, won the best paper award at the federated learning workshop at ICML. More master students got top papers accepted: Oğuz Yüksel at ICCV, Lingjing Kong at ICML, and Zhuoyuan Mao at ACL – in addition to 4 successful workshop papers by
Mariko Makhmutova, David Roschewitz, Gilberto Manunza and Yatin Dandi.
NLP papers
2021/05/31: Two papers of our group on Static Embeddings as well as Cross-Lingual Sentence embeddings were accepted at the upcoming ACL 2021 conference.
Papers at ICML
2021/05/15: Papers of our group were accepted at this year’s ICML conference: on attention is not all you need, Byzantine robust training, consensus control, decentralized momentum, and conformal prediction.
PyTorch 1.8 distributed training
2021/03/14: Our PowerSGD algorithm was selected as the default communication compression algorithm for distributed training in PyTorch 1.8.
Papers at ICLR, AISTATS and CVPR
2021/03/01: Some papers of our lab were accepted at the upcoming ICLR, AISTATS and CVPR conferences, including on training GANs with lookahead, and data parallelism in sparse nets, equivariant self-attention for vision, gradient compression 1 and 2, as well as critical batch-sizes, and differentiable patch selection.
ML4Science course projects
2020/12/31: As part of the ML course, 92 interdisciplinary ML4Science projects were submitted by student groups this year. You find them here.
Ultrasound Covid
2020/10/05: The Swiss radio showcased our project on ultrasound imaging, a joint project of iGH and the university hospital (CHUV)
Papers at NeurIPS
2020/10/01: Several papers of our group were accepted at this year’s NeurIPS conference, on Decentralized Training, Model Fusion, Distillation for Federated Learning, Loss Landscape for Adversarial Training, and Adam vs SGD
XPrize winners
2020/09/30: Our team are winners in the Collective Pandemic Intelligence XPrize for our WHAT IF…? Interactive Global COVID Policy Simulator
Machine learning course starting
2020/09/15: The Machine Learning course CS-433 again starts with 470 students inscribed. All materials are publicly available on github.
XPrize finalists
2020/09/15: Our team has been selected as finalists in the Collective Pandemic Intelligence XPrize
–> Join us at the AI for Good Global Summit on 22-24 September
Papers at ICML
2020/07/02: Some papers of our group were accepted at this year’s ICML conference: on dynamic decentralized learning, faster federated learning, large-batch DL training, local SGD, and hyperparameter tunability.
Hack on Covid-19
2020/03/23: We’re co-organizing a CODID-19 1yearhack
AISTATS results on Frank-Wolfe, and Optimal Transport
2020/02/18: New results on Frank-Wolfe optimization and Optimal Transport for Text Representation Learning appearing at AISTATS.
Papers at ICLR
2019/12/20: Five papers of our lab on the topics of decentralized deep learning, model compression, attention >> CNN, local SGD, and neural architecture search were accepted at the upcoming ICLR conference in Addis Ababa, Ethiopia
New survey paper on federated and decentralized learning
2019/12/19: we contributed to a new review paper fresh on arXiv
Applied Machine Learning Days
2019/10/25: End of January 2020, we will again co-organize one of the largest European ML events, with 30 workshops and 25 tracks. MLO sessions include the NLP track, Industry Track, Theory meets Practice Workshop, and PyTorch Workshop.
Swiss ML day
2019/10/25: join us November 14 for the free one-day workshop meeting of Swiss ML researchers and students.
PowerSGD video
2019/10/15: New youtube video about the PowerSGD paper
Machine learning course starting
2019/09/16: The Machine Learning course CS-433 again starts with 540 students inscribed. All materials available on github.
Papers at NeurIPS
2019/09/11: Two new papers on PowerSGD and unsupervised time series representation learning were accepted at this year’s NeurIPS conference.
SDSC fellowship
2019/08/05: Congrats to Jean-Baptiste Cordonnier for receiving the SDSC PhD Fellowship.
Summer school
2019/06/27: We’re teaching a one-day course on Optimization for Machine Learning and Deep Learning at the Paris DS3 summer school.
Papers at ICML
2019/05/08: Four papers of our group were accepted at this year’s ICML conference: on decentralized SGD & gossip (video), signSGD and error compensation, multi-model training, and interpretable LSTMs.
Optimization for ML course
2019/02/22: Optimization for Machine Learning – CS-439, is starting. All lecture materials are publicly available on our github.
New papers by our group
2019/02/20: on local SGD (ICLR), greedy optimization (AISTATS), multilingual embeddings, code (WSDM), and keyphrase extraction, code (CoNLL) and word embeddings (NAACL)
Applied Machine Learning Days
2019/01/10: We are co-organizing this year’s event with over 2000 attendees. MLO sessions include the Industry Track, Theory meets Practice Workshop, and PyTorch Workshop.
Machine learning course starting
2018/09/18: The Machine Learning course CS-433 has started with over 530 students inscribed. Need a larger room!
Papers at NeurIPS
2018/09/11: Four papers of our group were accepted at this year’s popular NeurIPS conference: CoLa, block-floating-point, Sparsified SGD (video), and BFGS.
Application Impact
2018/09/09: We’re excited that some of our work found impact in diverse fun areas including Koala Activity Detection, Electric Vehicle Charging, Toxicogenomics and Victorian Era Authorship Attribution.
MLBench – Distributed ML Benchmarking
2018/09/07: We’re launching MLBench, a public and reproducible collection of reference implementations and benchmark suite for distributed machine learning algorithms, frameworks and systems.
COLA: Decentralized Linear Learning
2018/08/13: new paper and code for decentralized learning.
Keyphrase Extraction
2018/07/27: the paper Simple Unsupervised Keyphrase Extraction using Sentence Embeddings was accepted at CoNLL 2018
Faculty position for Aymeric at École Polytechnique in Paris
2018/06/01: Aymeric Dieuleveut, postdoc at MLO, has accepted a faculty position at École Polytechnique in Paris.
Two new papers appearing at ICML
2018/05/11: On Matching Pursuit and Coordinate Descent by Francesco Locatello, Anant Raj, Sai Praneeth Karimireddy, Gunnar Rätsch, Bernhard Schölkopf, Sebastian U. Stich, Martin Jaggi and A Distributed Second-Order Algorithm You Can Trust by Celestine Dünner, Aurelien Lucchi, Matilde Gargiani, An Bian, Thomas Hofmann, Martin Jaggi.
Google Focused Research Award
2017/03/23: We have been awarded a Google Focused Research Award 2018 in the area of Machine Learning, jointly with Alexandre d’Aspremont and Francis Bach.
Algorithms in the wild
2018/03/20: IBM and NVIDIA announced their new partnership, citing 46x faster training of logistic regression, resulting from a combination of our DuHL and CoCoA algorithms, in front of 40’000 attendees at the IBM think conference.
New Optimization for ML course
2018/02/23: A brand new course – Optimization for Machine Learning – CS-439, has started with 110 students inscribed. All lecture materials are publicly available on our github.
sent2vec paper accepted at NAACL
2018/02/15: General purpose document embeddings, trained unsupervised. Get them while they are fresh! Congrats Matteo Pagliardini and Prakhar Gupta!
AISTATS Paper on Adaptive First-Order Optimization
2018/02/01: Congrats to Praneeth Karimireddy and Sebastian Stich, for the paper Adaptive Balancing of Gradient and Update Computation Times using Global Geometry and Approximate Subproblems.
Faster Linear Learning using GPUs
2017/12/08: Celestine’s NIPS paper was covered on HackerNews, as well as by IBM and Nvidia. Here is the 3min video.
Video: Safe Adaptive Importance Sampling
2017/11/25: 3min video for Sebastian’s upcoming NIPS spotlight presentation
sent2vec – features for text
2017/10/01: Our general purpose features for short texts have found many applications and already reached 100 (update: >1.2k) stars on github. The representations are trained unsupervised, very efficient to compute, and can be used for any machine learning task later on.
Project Funding from SNSF
2017/09/30: Our project “Reliable Open-Source Large-Scale Machine Learning” has received 3 years of funding from the Swiss National Science Foundation
Machine Learning Course Started
2017/09/19: The Machine Learning course (CS-433) has started this week with over 440 students inscribed.
NIPS conference: Importance Sampling, Heterogeneous Systems, and Cone Optimization
2017/09/04: Three papers were accepted at the upcoming NIPS conference: Safe Adaptive Importance Sampling (spotlight) by Sebastian Stich, Anant Raj and Martin Jaggi, Efficient Use of Limited-Memory Accelerators for Linear Learning on Heterogeneous Systems by Celestine Dünner, Thomas Parnell and Martin Jaggi, Greedy Algorithms for Cone Constrained Optimization with Convergence Guarantees by Francesco Locatello, Michael Tschannen, Gunnar Raetsch and Martin Jaggi
Short Course on Optimization for Machine Learning
2017/07/05: Brief lecture notes and practical labs with solutions for the Pre-doc Summer School on Learning Systems in Zurich.
Approximate Steepest Coordinate Descent accepted at ICML
2017/05/14: The paper by Sebastian Stich, Anant Raj and Martin Jaggi was accepted for publication at ICML 2017
Google Research Award
2017/02/23: Our project “A Computational View on Sentence Embeddings” received a Google Faculty Research Award 2016 in the area of Machine Learning and Data Mining.
Greedy and Coordinate Algorithms at AISTATS
2017/02/05: Two papers were accepted at the AISTATS conference: A Unified Optimization View on Generalized Matching Pursuit and Frank-Wolfe by Francesco Locatello, Rajiv Khanna, Michael Tschannen and Martin Jaggi – and Faster Coordinate Descent via Adaptive Importance Sampling by Dmytro Perekrestenko, Volkan Cevher and Martin Jaggi.
Applied Machine Learning Days
2017/01/31: More than 450 participants from across industry and academia were attending the first instance of the Applied ML Days, organized by the labs of Marcel Salathé, Robert West and Martin Jaggi. Videos of the presentations are available here on youtube.
Text Sentiment Analysis Paper Accepted at the WWW Conference
2017/01/03: Our paper on multi-lingual text sentiment analysis using convolutional neural networks was accepted at WWW 2017. This is joint work of Jan Deriu, Aurelien Lucchi, Valeria De Luca, Mark Cieliebak, Simon Müller, Aliaksei Severyn, Thomas Hofmann and Martin Jaggi.
Welcome Mikhail and Sebastian
2016/12/01: Two amazing senior researchers have freshly started in our lab in December. Welcome Mikhail Langovoy and Sebastian U. Stich!
Our Distributed Machine Learning Algorithm in TensorFlow
2016/09/20: Google has implemented our CoCoA+ algorithm as the default distributed solver in the popular TensorFlow framework, for linear machine learning models. The code is open source and our papers describing the method are available here, here, and here.
Machine Learning Course Started
2016/09/20: The Machine Learning course (CS-433) has started this week with 298 students inscribed.
Winner of the BirdCLEF 2016 Competition on Audio-Based Bird Species Classification
2016/09/06: The system developped by master student Elias Sprengel won this year’s international competition on bird species identification, by a deep learning approach with promising applications in ecology.
Funding from Microsoft Research
2016/08/19: Our project “Coltrain: Co-located Deep Learning Training and Inference”, jointly with the lab of Babak Falsafi, has received two years of funding from Microsoft Research. 10 Projects have received funding under this new Swiss Joint Research Centre initiative.
Master thesis of Dmytro Perekrestenko on Importance Sampling
2016/08/17: Dmytro Perekrestenko has finished and defended his master thesis “Faster optimization through adaptive importance sampling“.
Start of Machine Learning and Optimization Laboratory
2016/08/01: The Machine Learning and Optimization Laboratory officially started at EFPL.
Paper “Primal-Dual Rates and Certificates” at ICML
2016/06/19: New paper appearing at this year’s ICML conference “Primal-Dual Rates and Certificates”. Here is a poster of it. Our approach allows more optimization problems to be equipped with practical accuracy certificates, such as L1, elastic-net, TV and others. By Celestine Dünner, Simone Forte, Martin Takac, Martin Jaggi.
Winner of the SemEval 2016 Competition on Text Sentiment Analysis
2016/06/17: Our system developed by master students Jan Deriu and Maurice Gonzenbach won this year’s text sentiment classification competition, placing first out of 34 teams from all over the world. Here is a news article, and our systems description paper.