Student Projects

On this page you can find our offerings for Master’s Projects and Master and Bachelor Research Projects in the realm of data mining and machine learning for (vocational) education ecosystems for the spring semester 2025. Please note that this list of projects is not exhaustive and will be updated over the coming days with further exciting projects.

Last update: 17.12.2024

How to apply

Please apply via our student project application form. You will need to specify which project(s) you are interested in, why you are interested and if you have any relevant experience in this area. To access the form, you need to log in with your EPFL email address. If you would like to receive more information on our research, do not hesitate to contact us! Students who are interested in doing a project are encouraged to have a look at the Thesis & Project Guidelines, where you will gain an understanding about what can be expected of us and what we expect from students.

The student project application form will remain open for submissions until the late deadline of 24 January 2025. Applications will be reviewed on a on-going basis, so we strongly encourage you to apply as soon as possible. Early applicants will be considered starting the first week of December, with the early deadline set for 29 November 2024:

  • Early deadline: 29.11.2024
  • First contact with supervisor: between 02.12.2024 and 06.12.2024
  • Late deadline: 24.01.2025

External students: Non EPFL students are kindly requested to get in touch with the project(s) supervisors by e-mail.

Please, note that the list of topics below is not exclusive. In case you have other ideas or proposals, you are welcome to contact a senior member of the lab and talk about possibilities for a tailor-made topic.

Project 1: Evaluating Path Language Modeling Explainability (XPLM) Methods for Course Recommendation Systems

At ML4ED, we are developing state-of-the-art Path Language Modeling (PLM) recommendation models specifically for course recommendation, designed to help students make informed decisions about their educational paths. With the adoption of PLM models that leverage sequential data and knowledge graph paths, transparency is essential for students the reasoning behind course recommendations. While graph reasoning approaches—such as those using Reinforcement Learning (RL) to explicitly reason over knowledge graphs—have been evaluated for explainability, there is a notable gap in explainability methods tailored to PLM-based recommendation models, particularly in sequential recommendation contexts.

This research project will focus on evaluating the explainability of PLM-based methods specifically developed at ML4ED for course recommendations, bridging the fields of explainable AI, sequential recommendation, and educational technology. We will assess and compare several PLM methods to provide transparent insights into PLM models in the educational context. The project will evaluate different aspects of PLM explainability, such as the influence of sequential course paths and knowledge graph pathways, using both qualitative and quantitative metrics to measure the quality and impact of explanations.

Requirements

  • An interest in: Explainable AI, Recommendation Systems, Educational Technology
  • Proficiency in: Python; Jupyter Notebooks
  • Optional: Experience in conducting user studies via Prolific

Level: Master

Supervision: Jibril Frej (Postdoc)

Project already assignedProject 2: Automated Knowledge Graph Extraction with Human-in-the-Loop Revision for Educational Materials

Knowledge graphs are increasingly valuable for structuring educational resources, enabling personalized learning pathways, and fostering better educational insights. This ML4ED semester project will focus on creating a pipeline that automates the creation of knowledge graphs from diverse input materials (e.g., texts, videos, quizzes), where the graph’s nodes represent learning modules (skills) and the edges capture prerequisite relationships.

A key innovation will be the design and integration of a human-in-the-loop feature for human experts to edit the generated knowledge graphs. This process will not only allow experts to revise incorrect relationships but will also provide insights into common pitfalls in the automated extraction process. We will use educational content from open-source courses such as EPFL’s MLBD and UC Berkeley’s Data 8 as a testbed for development and evaluation in the use case of data science education.

This project will address the following core questions:

  1. How can we automate knowledge graph creation from multi-modal input data?
  2. Can we design an effective human-in-the-loop mechanism for revising these knowledge graphs?
  3. What are the common errors in the automated graph extraction pipeline, and how can human feedback improve its performance?

Requirements

  • an interest in: Knowledge Graphs, NLP, Education
  • proficiency in: Python; NLP frameworks (e.g., spaCy, Hugging Face); Graph libraries (e.g., NetworkX, Neo4j); Data Visualization.

Further details

This semester project is aimed at one MSc student with strong technical skills, as part of a larger ML4ED initiative called Scholé AI on adult learning. The student will be supervised by Vinitra Swamy (PhD) and Paola Mejia (PhD).

Project already assignedProject 3: Developing a RAG-based Tutor for Course Guidance and Advice

Intelligent tutoring systems play a critical role in enhancing student learning by providing personalized feedback, answering questions, and offering guidance on course selection and learning strategies. Recent advances in Retrieval-Augmented Generation (RAG) systems, which combine retrieval capabilities with the generative power of language models, have shown promise in various domains. However, their potential for delivering accurate and context-aware tutoring in education remains underexplored. To maximize their effectiveness, RAG systems must incorporate insights from pedagogy and address challenges such as retrieval accuracy, generation quality, hallucination minimization, and adaptability to educational contexts.

This project focuses on developing a RAG-based intelligent tutor that answers student questions and provides tailored advice and guidance about specific courses. The main goals are:

  1. To design and implement a RAG system that integrates pedagogical findings for improved tutoring capabilities.
  2. To build a comprehensive evaluation framework assessing retrieval accuracy, generation quality, hallucination rates, and overall tutoring effectiveness.
  3. To conduct ablation studies to identify the critical components contributing to the system’s performance, such as retrieval strategies, fine-tuning methods, and integration of pedagogical principles.

The tutor will be tested in an educational context, aiming to help students make informed decisions about their learning paths and enhance their understanding of course content.

Requirements

  • An interest in: Educational Technology, Natural Language Processing, Explainable AI
  • Proficiency in: Python; Hugging Face Transformers; Machine Learning
  • Optional: Experience with LLMs, RAG

Level: Master.

Supervision: Jibril Frej (Postdoc).

Project already assignedProject 4: Explainable Recommender System for Short-Term Personalized Learning Paths

Personalized learning is key to improving learner engagement and outcomes. This ML4ED semester project will focus on designing and implementing an explainable recommender system that generates adaptive learning paths based on a user’s learning goals. 

The recommender system will integrate with an open source educational LLM (ideally from the Swiss AI Education initiative) to dynamically extract student learning goals through a natural language conversation. Using these goals, the recommender will traverse a knowledge graph to create tailored learning paths for one learning session, represented as directed subgraphs where nodes are modules and edges are skill prerequisites.

The system will incorporate a simple form of knowledge graph explainability, linking each recommendation to a specific user goal and structuring learning paths as an ordered sequence of concepts. This project will not rely on reinforcement learning, instead leveraging graph traversal and LLM-based extraction pipelines.

This project will address the following core questions:

  1. How can conversational AI extract clear and actionable learning goals from users?
  2. How can we enable a curriculum for a specified learning goal through a LLM-based knowledge graph recommender system?
  3. How can we quantitatively evaluate the success of this explainable path recommendation system?

Requirements

  • an interest in: Conversational AI, graph-based recommender systems, Personalized Education, and Explainable AI.
  • proficiency in: Python; NLP frameworks (e.g., Hugging Face, spaCy); Graph libraries (e.g., NetworkX, Neo4j); Recommender Systems.

Further details

This semester project is aimed at one MSc student with strong technical skills, as part of a larger ML4ED initiative called Scholé AI on adult learning. There will be a team of project students working on aligned projects. The student will be supervised by Vinitra Swamy (PhD) and Paola Mejia (PhD).

Project already assignedProject 5: Providing intelligent support to improve peer review writing skills of learners

Providing peer reviews (also called peer feedback or peer evaluation) is considered not only as a fundamental pillar of the scientific publishing process, but also in the workplace and in educational settings. Writing professional reviews is also considered an important way of giving feedback in agile workplaces and a necessary skill for employees. Moreover, prior works on peer review have shown that it can be a useful way to develop the metacognitive and critical thinking skills of learners. However, previous works show that humans may struggle with learning how to provide high-quality reviews across multiple domains, mainly in terms of the structure, persuasiveness, empathy, perspective-taking, and emotional appropriateness of the reviews. Therefore, researchers have explored methods to model review quality, with the goal of supporting humans in writing better reviews. Nevertheless, there still exists the opportunity to support the novice reviewers in their peer review writing process using AI-enabled intelligent and interactive writing assistants, with modern approaches including the generative large language models.

In this project, we aim to design, implement, and evaluate a tool for helping novice learners on how to write high-quality peer reviews, building on prior work in this area done both in the ML4ED lab and in other research groups. We will mainly focus on

  1. conducting the design process of the tool (e.g., user interviews)
  2. coming up with metrics to evaluate peer review quality across domains
  3. implementing the tool based on large language models, and
  4. evaluating it in an online or in-classroom study, with the aim of publishing it as a paper in a top-ranked conference or journal.

Requirements:

  • General ML and NLP knowledge (e.g., Transformers, LLMs such as BERT and GPT)
  • Bonus: experience in interaction design, HCI, user studies, or evaluations
  • Bonus: knowledge of front-end and back-end development

Level: Master

Supervision: Seyed Parsa Neshaei (PhD student)

Project already assignedProject 6: Natural Language Knowledge Representations for Intelligent Tutoring Systems

Knowledge tracing (KT) is the problem of predicting future task performance of students based on their past responses. The field of KT has developed within the last decades from simple models to deep learning approaches. Recently, language models have been employed to leverage semantic task information and create advanced student representations.

In this ML4ED semester project, you will work on log data from an intelligent tutoring system and explore language models to represent students’ knowledge based on their past learning behavior. The goal is to quantify the trade-off between sparse numerical knowledge representations and rich verbal descriptions with respect to training efficiency, performance, and other pedagogical goals such as informative feedback.

This project will address the following core questions:

  1. How to represent student knowledge in natural language?
  2. How much more efficient do verbal knowledge descriptions render KT algorithms?
  3. What is the optimal verbosity of such knowledge representations with respect to scale, informativeness, and performance?

Requirements:

  • Proficiency in: Python, Natural Language Processing, and Machine Learning (e.g., Transformers, LLMs)
  • Interest in: Learning Sciences and Language Models

Level: Master

Supervision: Dominik Glandorf (PhD) & Paola Mejia (PhD).