Student Projects

  • We offer a wide variety of projects in the areas of Machine Learning, Optimization and applications. The list below is not complete but serves as an overview. 
  • Students who are interested to do a project at the MLO lab are encouraged to have a look at our

    where we describe what you can expect from us (and what we expect from you). 

  • If you have not (yet) taken our courses we might ask for your grade sheet for getting to better know your background in the topic area.
  • We are only able to supervise a limited number of projects each semester. Priority will be given to Master Thesis Projects (full time).
  • Please apply to student projects via the following link. Please send only one application and indicate the projects you are interested in the application form.

Available MSc, BSc and PhD Semester Projects 2025

  • Several projects around large language models (LLMs)
    including:
    • Curriculum learning for pre-training (data ordering and filtering), theory and practice
    • Scaling to longer context windows through modifications of attention
    • Efficient LLM pretraining engineering
    • Finetuning and alignment models on top of pretrained LLMs
    • Multi-lingual and tools-usage during pretraining & finetuning
    • Alternative architectures, and modularity for LLMs
    • Optimization landscapes for LLMs

    Contact: Martin Jaggi

  • Computation-Adaptive Transformers
    The difficulty of predicting the next token varies across tokens. Chain-of-Thought methods allocate additional computation to solve a task through intermediary tokens. However, these methods can only be leveraged with large language models and at the task level. The objective of this project is to enable similar computational adaptivity through intermediate tokens in any language model. Contact: Amirkeivan Mohtashami and Matteo Pagliardini 
  • Mixture-of-Experts in Transformer Architectures
    Sparsely activated neural networks architectures have emerged as a viable approach to scaling models by decoupling model size from computational cost. In particular, many recently released models opt for mixture-of-experts (MoE) as a replacement to the standard feedforward layers. However, their training behavior and capacities still remain poorly understood, while new algorithms are constantly proposed. The goal of this project is to implement and compare different architectural choices, such as routing algorithm and expert size, in order to understand their behavior and scaling properties. Contact: Alex Hägele
  • Multi-Context Language Models
    Language models are typically trained on continuous text segments. However, longer texts may not always follow a linear progression. This project aims to enhance language models by enabling them to effectively handle multiple contexts. Contact: Amirkeivan Mohtashami
  • Learning to optimize
    learns an optimizer on the fly by teaching a neural network (for example an RNN) to takes the current raw gradient as input, updates its state, and outputs an improved “treated” gradient which can be used for performing a step. In collaborative learning we have multiple agents trying to collaborate in learning (their own or a shared learning task), how can we use the former idea in this context? Contact: El Mahdi Chayti
  • Learning to sample on the fly for SGD / curriculum learning
    One can use an auxiliary network that learns to assign scores to the different data points in the training set, then using these scores we can select and improved set of training points. We will explore several ways to train the auxiliary parameters. Contact: El Mahdi Chayti
  • Landscape analysis and second-order methods in deep learning
    We are interested in studying the geometry of loss landscapes in deep neural networks and generalization properties of its stationary points. We want to use visualization tools to plot 2d and 3d projections of the surfaces to distinguish sharp and wide minimums and the regions with saddle points. Then, we want to compare the trajectories of first-order optimization algorithms and cubically regularized Newton, which provably converges to a second-order local minimum.
    Contact: Nikita Doikov
  • Build decentralized ML (and LLMs) in the browser
    (Practical project)
    Join our larger team project to build a decentralized (and federated) training software, where many clients (e.g. mobile phones or hospitals) can collaboratively train a joint ML model, such as smaller LLMs, while respecting data privacy and locality. Combines algorithmic and practical challenges. Code mostly in JavaScript. Contact: Martin Jaggi (and you can directly join the Disco slack channel)

Contact us for more details!

Interdisciplinary Projects

See here for the full list of interdisciplinary health-related projects where we collaborate with Yale and EPFL via iGH (intelligent Global Health).


 

Completed Thesis Projects

Master Theses (internal at EPFL):

2023 todo…
Kuan Tung, 2022, Optimizing Modular Clinical Decision Support Networks
with Decision Flippability Score
Sylvain Lugeon, 2022, Towards Online Predictions of the Dreaming Activity: a Deep Learning approach
Robin Zbinden, 2022, Implementing and Experimenting with Diffusion Models for Text-to-Image Generation
Omar Gallal Younis, 2022, Performance optimization of PowerSGD
Daniel Hinjos, 2022, DeeperBreath: Model Optimization for Automated Diagnosis of Respiratory Pathologies from Lung Sounds
Maximilian Plattner, 2022, On SGD with Momentum
Walid Bennaceur, 2022, Privacy-preserving robust training with zero-knowledge proofs
Hugo El Guedj, 2021, Disco – Decentralized Collaborative Machine Learning
Liu Futong, 2021, The Phenomenon of Memorization in Over-Parameterized Neural Networks
Cecile Trottet, 2021, Modular Clinical Decision Support Networks
Amandine Evard, 2021, Deep learning on lung ultrasound images for the diagnosis of COVID-19 – target population, clinical data and automated image extraction
Pablo Canas, 2021, Contrastive learning on lung ultrasound images for COVID-19 diagnosis and prognosis
Harshvardhan, 2021, Escaping Local Minima with Stochastic Noise
Lingjing Kong, 2021, Slimmable Training for Heterogeneous Federated Learning Systems
Aiyu Liu, 2021, Diagnosis and Prognosis Prediction of the Ebola Virus Disease Using Machine Learning Methods
David Roschewitz, 2021, IFedAvg: Interpretable Data-Interoperability for Federated Learning
Julien Heitmann, 2021, DeepBreath: Automated Detection of Lung Pathology in Digital Lung Auscultation
Gilberto Manunza, 2021, On the Impact of Adversarial Training on Uncertainty Estimation and Uncertainty Targeted Attacks
Sebastian Bischoff, 2020, Importance Sampling to Accelerate DL Training from Data
Anne Sophie Van de Velde, 2020, On federated linear regression
Ezgi Yücetürk, 2020, Unsupervised Representation Learning and Visualization
with Deep Neural Networks for Dream Detection from EEG
Theophile Bian, 2020, MedAL-ai: A Platform for Interpretable Machine Learning Clinical Decision Support
Algorithms
Felix Grimberg, 2020, Optimal Weighted Model Averaging for
Personalized Collaborative Learning
Jiafan Liu, 2020, DeAI: A Decentralized
Machine Learning Prototype
William Cappelletti, 2020, Byzantine-robust decentralized optimization for Machine Learning
Hugo Schmutz, 2020, Automating point-of-care COVID prognostication using deep learning on lung ultrasound images
Thierry Bossy, 2020, Adaptive Mitigation: Identification of the Dynamic Drivers of Effective Policy during the COVID-19 Pandemic
Ignacio Alemán, 2020, Causality from Noise
Residuals
Edoardo Hölzl, 2020, Benchmarking Distributed Machine Learning Algorithms
Liamarcia Bifano, 2020, A machine learning platform to generate data-responsive and medically approved clinical decision algorithms
Pei Wang, 2020, Transformer-Based Multi-lingual Sentence Embeddings
Ahmad Ajalloeian, 2019, Stochastic Zeroth-Order Optimisation Algorithms with Variance Reduction
Akhilesh Gotmare, 2019, Layerwise Model Parallel Training of Deep Neural Networks
Andreas Hug, 2018, Unsupervised Learning of Embeddings for Detecting Lexical Entailment
Jean-Baptiste Cordonnier, 2018, Convex Optimization using Sparsified Stochastic Gradient Descent with Memory
Lie He, 2018, COLA: Communication-Efficient Decentralized Linear Learning
Wang Zhengchao,  2017,  Network Optimization for Smart Grids
Marina Mannari,  2017,  Faster Coordinate Descent for Machine Learning through Limited Precision Operations
Matilde Gargiani,  2017,  Hessian-CoCoA: a general parallel and distributed framework for non-strongly convex regularizers

Internships:
Raphaël Selz, 2022
Rustem Islamov, 2022
Alyssa Unell, 2022
Abdellah El Mrini, 2022
Amina Rufai, 2022
Amna Elmustafa, 2022
Hugo El Guedj, 2022
Jonathan Dönz, 2022
Omar Gallal Younis, 2022
Maram Awad, 2022
Elizaveta Kostenok, 2022
Alaa Anani, 2021
Kimia Mohsenian, 2021
Devyani Maladkar, 2021
Shikhar Mohan, 2021
Kimia Mohsenian, 2021
Julien Heitmann, 2021
Sadegh Khorasani , 2021
Jonathan Dönz, 2021
Jonathan Dönz, 2020
Mahmoud Hegazy, 2020
Deeksha M S, 2020
Edoardo Hölzl, 2020
Lucas Massemin, 2020
Emmanuil Angelis, 2020
Namhoon Lee, 2020, Training compressed/sparse deep learning models
Srijan Bansal, 2019, Adaptive Stepsize Strategies for SGD
Shreshta Alevooru, 2019, Fairness Objectives for Training Word Embeddings
Xu Lu, 2019, Cross-lingual transfer learning and distillation with transformers
Scott Pesne, 2019, Convergence diagnostics for SGD
Ali Sabet, 2019, Robust Cross-lingual Embeddings from Parallel Sentences
Éloïse Berthier, 2019, Differential Privacy
Myriam Bégel, 2019, Medical Machine Learning
Lie He, 2018, CoLA and MLbench
Jean-Yves Franceschi, 2018, Unsupervised Scalable Representation Learning for Multivariate Time Series
Riccardo de Lutio, 2018, Medical Machine Learning
Evann Courdier, 2018, Learning World Models (jointly with Francois Fleuret)
Polina Kirichenko,  2018,  Zero-order Optimization for Deep Learning
R S Nikhil Krishna,  2018,  Importance Sampling and LSH
Prashant Rangarajan,  2018,  Multilingual matrix factorizations
Jeenu Grover,  2018,  Learning 2 Learn
Anastasia Koloskova, 2017,  Coordinate Descent using LSH
Vasu Sharma,  2017,  CNNs for Unsupervised Text Representation Learning
Pooja Kulkarni,  2017,  Variable metric Coordinate Descent
Tao Lin,  2017,  Adversarial Training for Text
Tina Fang,  2017,  Generating Steganographic Text with LSTMs
Valentin Thomas,  2017,  Model-parallel back-propagation
Anahita Talebi Amiri,  2017,  Lasso – Distributed and Pair-Wise Features

Semester Projects:


Aryan Agal, 2019, Effects of Varying Integration Time and Beam Power on Machine Learning for Raman Spectroscopy
Lingjing Kong, 2019, Extrapolation for Large-batch Training in Deep Learning
Mingbo Cui, 2019, babyBERT: a distilled faster and smaller BERT
Sadra Boreiri, 2019, Decentralized SGD with Changing Topology and Local Updates
Fedor Moiseev, 2019, Efficient Federated Deep Learning with Non-IID Data
Harshvardhan, 2019, Convergence Analysis of SGD in the Large-Batch Regime
Mohamed Ndoye, 2019, Collaborative privacy: Incentivizing sharing of medical data among competing parties in rural settings
Damian Dudzicz, 2019, Decentralized Optimization with Local Push-Sum SGD
Oriol Barbany Mayor, 2019, New Efficient Training Methods for Robust Models
Rayan Elalamy, 2019, Convergence Analysis for Hogwild! under less Restrictive Assumptions
Lionel Gerard, 2019, Stochastic Extragradient for GAN Training
Jan Benzing, 2019, A Machine-Learning approach for imputation of missing values in a biomedical dataset of febrile patients in Tanzania
Hongyu Luo, 2019, Dream Detection from EEG Time Series
Nicola Ischia, 2019, Dream Detection from EEG Time Series
Sidakpal Singh, 2019, Structure-aware model averaging with Optimal Transport
Brock Grassy, 2019, Nano Manufacturing with ML and Raman Spectroscopy
Atul Kumar Sinha, 2019, Unsupervised Sentence Embeddings Using Transformers
Hajili Mammad, 2019, Unsupervised Sentence Embeddings Using Transformers
Claire Capelo, 2019, Adaptive schemes for communication compression: Trajectory Normalized Gradients for Distributed Optimization
Aakash Sunil Lahoti, 2019, Theoretical Analysis of Minimum of Sum of Functions
Jelena Banjac, 2019, Software Tools for Handling Magnetically Collected Ultra-thin Sections for Microscopy
Nikita Filippov, 2019, Differentially Private Decentralized Batch SGD Under Varied Conditions
Devavrat Tomar, 2019, Neural Voice Conversion
Lingjing Kong, 2019, Adaptive Methods for Large Batch Training
Peilin Kang, 2019, A comparison of model-parallel training methods for deep learning
Nicolas Lesimple, 2019, Automated Machine Learning
Ezgi Yuceturk, 2018, Dream Detection from EEG Time Series
Delisle Maxime, 2018, Twitter Demographics
Bojana Rankovic, 2018, Handwritten Text Recognition on Student Essays
Cem Musluoglu, 2018, Quantization and Compression for Distributed Optimization
Quentin Rebjock, 2018, Error Feedback Fixes SignSGD and other Gradient Compression Schemes
Jimi Vaubien, 2018, Derivative-Free Empirical Risk Minimization
Ali Hosseiny, 2018, Human Query – From natural language to SQL
Servan Grüninger, 2018, Location prediction from tweets
Marie-Jeanne Lagarde, 2018, Steganographic LSTM
Sidakpal Singh, 2018, Context Mover’s Distance & Barycenters: Optimal transport of contexts for building representations
Arthur Deschamps, 2018, Simulating Asynchronous SGD + numerical results
Chia-An Yu, 2018, Feedback & quantization in SGD
Kshitij Kumar Patel, 2018, Communication trade-offs for synchronized distributed SGD with large step size
Martin Josifoski, 2018, Cross-lingual word embeddings
William Borgeaud,  2017,  Adaptive Sampling in Stochastic Coordinate Descent
Castellón Arevalo Joel,  2017,  Complexity analysis for AdaRBFGS: a primitive for methods between first and second order
Alberto Chiappa,  2017,  Asynchronous updates for Stochastic Gradient Descent
Ahmed Kulovic,  2017,  Mortality Prediction from Twitter
Arno Schneuwly,  2017,  Correlating Twitter Language with Community-Level Health Outcomes
Sina Fakheri,  2017,  A Machine-learning Mobile App to support prognosis of Ebola Virus Diseases in an evolving environment
Remy Sun,  2017,  A Convolutional Dictionary Method to Acquire Sentence Embeddings
Hakan Gökcesu,  2016,  Distributed SGD with Fault Tolerance
He Lie,  2017,  Distributed TensorFlow implementation of sparse CoCoA
Oberle Jeremia,  2016,  A Machine-Learning Prediction Tool for the Triage and Clinical Management of Ebola Virus disease
Akhilesh Gotmare,  2016,  ADMM for Model-Parallel Training of Neural Networks

Francesco Locatello: Greedy Optimization and Applications to Structured Tensor Factorizations,
Master thesis, ETH, September 2016

Dmytro Perekrestenko: Faster Optimization through Adaptive Importance Sampling,
Master thesis (jointly supervised with Volkan Cevher), EPFL, August 2016

Elias Sprengel: Audio Based Bird Species Identification using Deep Learning Techniques,
Master thesis (jointly supervised with Yannic Kilcher), ETH, August 2016

Jonathan Rosenthal: Deep Learning for Go
Bachelor thesis (jointly supervised with Yannic Kilcher and Thomas Hofmann), ETH, June 2016

Maurice Gonzenbach: Sentiment Classification and Medical Health Record Analysis using Convolutional Neural Networks,
Master thesis (jointly supervised with Valeria De Luca), ETH, May 2016

Jan Deriu: Sentiment Analysis using Deep Convolutional Neural Networks with Distant Supervision,
Master thesis (jointly supervised with Aurelien Lucchi), ETH, April 2016

Pascal Kaiser: Learning city structures from online maps,
Master thesis (jointly supervised with Aurelien Lucchi and Jan Dirk Wegner), ETH, March 2016

Adrian Kündig: Prediction of Cerebral Autoregulation in Intensive Care Patients,
Master thesis (jointly supervised with Valeria De Luca), ETH, January 2016

Bettina Messmer: Automatic Analysis of Large Text Corpora,
Master thesis (jointly supervised with Aurelien Lucchi), ETH, January 2016

Tribhuvanesh Orekondy: HADES: Hierarchical Approximate Decoding for Structured Prediction,
Master thesis (jointly supervised with Aurelien Lucchi), ETH, September 2015

Jakob Olbrich: Screening Rules for Convex Problems,
Master thesis (jointly supervised with Bernd Gärtner), ETH, September 2015

Sandro Felicioni: Latent Multi-Cause Model for User Profile Inference,
Master thesis (jointly supervised with Thomas Hofmann, and 1plusX), ETH, September 2015

Ruben Wolff: Distributed Structured Prediction for 3D Image Segmentation,
Master thesis (jointly supervised with Aurelien Lucchi), ETH, September 2015

Simone Forte: Distributed Optimization for Non-Strongly Convex Regularizers,
Master thesis (jointly supervised with Matthias Seeger, Amazon Berlin, and Virginia Smith, UC Berkeley), ETH, September 2015

Xiaoran Chen: Classification of stroke types with SNP and phenotype datasets,
Semester project (jointly supervised with Roqueiro Damian and Xiao He), ETH, June 2015

Yannic Kilcher: Towards efficient second-order optimization for big data,
Master thesis (jointly supervised with Aurelien Lucchi and Brian McWilliams), ETH, May 2015

Matthias Hüser: Forecasting intracranial hypertension using time series and waveform features,
Master thesis (jointly supervised with Valeria De Luca), ETH, April 2015

Lei Zhong: Adaptive Probabilities in Stochastic Optimization Algorithms,
Master thesis, ETH, April 2015

Maurice Gonzenbach: Prediction of Epileptic Seizures using EEG Data,
Semester project (jointly supervised with Valeria De Luca), ETH, Feb 2015

Julia Wysling: Screening Rules for the Support Vector Machine and the Minimum Enclosing Ball,
Bachelor’s thesis (jointly supervised with Bernd Gärtner), ETH, Feb 2015

Tribhuvanesh Orekondy: dissolvestruct – A distributed implementation of Structured SVMs using Spark,
Semester project, ETH, August 2014

Michel Verlinden: Sublinear time algorithms for Support Vector Machines,
Semester project, ETH, July 2011

Clément Maria: An Exponential Lower Bound on the Complexity of Regularization Paths,
Internship project (jointly supervised with Bernd Gärtner), ETH, August 2010

Dave Meyer: Implementierung von geometrischen Algorithmen für Support-Vektor-Maschinen,
Diploma thesis, ETH, August 2009

Gabriel Katz: Tropical Convexity, Halfspace Arrangements and Optimization,
Master’s thesis (jointly supervised with Uli Wagner), ETH, September 2008