You will find below a list of available bachelor semester projects, master’s semester projects, and a master’s thesis (PDM). If you are interested in a specific project, please get in touch with the person listed under “Contact”, mentioning in the subject of the email the title of the project.
For all these projets, you will receive a number of ECTS credits that depends on the type of project and the program. Working on these projects is not remunerated. The projects can be done in the Fall semester and the Spring semester.
You can also work on a non-credited, part-time, remunerated project as an Assistant. Working on these projects is subject to the EPFL rules on the maximum allowed weekly hours.
(If you are not an EPFL student, you can apply to the open internship at Idiap.)
- Breathing life in robots through animation
- Biological networks for language processing
- Automated segmentation of high-content fluorescent microscopy data
- Cultural bias in cross-lingual transfer
- Using large pretrained language models in speech recognition
- Understanding generalization in deep learning
- Grading of inflammatory diseases affecting blood vessels in the eye
- Automatic identification of flight information from speech
- An open-source framework for the quantification of Urban Heat Islands in Switzerland
- Swiss Alpine Lakes & Citizen Science
- Tensor trains for human-guided optimization in robotics applications
- Multi-spectral image unmixing for rapid and automated image annotation
- Audiovisual person recognition
- Understanding the robustness of machine learning models on underspecified tasks
- Social media and crowdsourcing for social good
- Punctuation restoration on automatic speech recognition output
- Ergodic control for robot exploration
- Clinically interpretable computer aided diagnostic tool using multi-source medical data
- Assessing radiomics feature stability and discriminative power in 3D and 4D medical data
- Automatic named entity recognition from speech
- A human-centered approach to understand local news consumption
- Data-driven identification of prognostic tumor subpopulations from single-cell RNA sequencing data
- Development of epigenetic biomarkers for chronic pain stratification
- Compartment-specific mRNA metabolism in MNs and ACs in ALS pathogenesis
- Speaker identification enhanced by the social network analyser
- Speech/Music Classification
- Error correction in speech recognition using large pre-trained language models
- Automatic speech recognition of air-traffic communication using grammar
- Pathological speech detection in adverse environments
- Pathological speech enhancement
- Parametric Gaze Following
- From Gaze Following to Joint Attention
- Person Head Detection
- Research Spotlight: Crafting An Interactive Webpage Template for Showcasing Science
- Spatio-temporal Modeling of Human Behavior in Videos
- Scaling Pre-training for Gaze Following
- Text flourishing for a robot writer: a learning and optimization approach
- Ergodic drawing for a robot manipulator
- Deep learning for a portraitist robot application
- Human body tracking with signed distance fields
Breathing life in robots through animation
Description
Robot motions are known to have a strong effect on human perception. In this project, you will be working on the Lio robot and design breathing animations for the platform. You will use the to add Perlin noise to the robot and create lifelike motion. If time permits, you will also create a number of animation inspired from animal behaviors to help the robot communicate emotions and test them in online and real world studies.
Goals
- Adapt the robot model (urdf and meshes) to the requirements for Lively
- Test behaviors in simulation
- Apply approach on the real robot
- conduct online study to evaluate the impact of the motion (only master project)
Research Program: Human-AI Teaming
Prerequisites: Good command of Python, experience with ROS/URDF, basics of Linux. Experience in robotics (inverse kinematics, control, or system architecture) would be a plus.
Level: Bachelor/ Master
Contact: Emmanuel Senft, [email protected], Jean-Marc Odobez, [email protected]
Biological networks for language processing
Description
Biological spiking neural networks are interesting from a scientific (evolution) point of view, as well as from a technical one. In the latter sense, spiking networks offer advantages over artificial ones in terms of recurrence, coding and power consumption. Recently we have shown that spiking neurons can be combined freely with artificial ones in the same architecture, showing promise in speech processing tasks. The object of the project would be to extend this work towards processing of discrete entities such as words at the level of language rather that audio. Some progress has already been made in the literature with the likes of “Spikeformer”, being a transformer equivalent.
Goals
- Investigate the issues around using spiking for discrete outputs
- Show that mainly spiking nets can replace, say, recurrent artificial components
Research Program: AI for Life
Prerequisites: The project will involve programming in python using the Pytorch library. Some knowledge of deep learning will be required, ideally from previous courses.
Reference
- Alexandre Bittar and Philip N. Garner. A surrogate gradient spiking baseline for speech command recognition. Frontiers in Neuroscience, 16, August 2022. http://dx.doi.org/10.3389/fnins.2022.865897
Level: Master
Contact: Phil Garner, [email protected]
Automated segmentation of high-content fluorescent microscopy data
Description
Amyotrophic Lateral Sclerosis (ALS) is a rapidly progressive and incurable neurodegenerative disease. The early events underlying the disease remain poorly understood. As a dramatic consequence no effective treatment has been developed. We previously found that the molecular events leading to ALS start during early development. It remains however unknown how and when these affect individual cell behaviour. This project aims to study how molecular biology shapes cellular morphology at early stage of ALS by integrating longitudinal cellular imaging with genomic data and it involves a close collaboration with the experimental laboratory of professor Rickie Patani, Francis Crick Institute/UCL.
Goals
The goal of the project is to develop an image analysis pipeline to extract and analyse single-cell phenotypic measurements from large-scale time-lapse fluorescence imaging data from astrocytes and motor neurons in culture. Specifically, it will involve
- expansion on existing image analysis modules to obtain robust single-cell readouts from longitudinal images; and
- development of statistical models to identify cellular trajectories associated with early stage of ALS development using the phenotypic features obtained in 1).
Research Program: AI for Life
Prerequisites: Candidates should have strong mathematical and computational skills. Candidates should be familiar with Python/R, and with the Linux environment. Experience in image processing and analysis, and machine learning is an asset. Candidates do not necessarily have to have a biological background but should have a strong desire to directly work with experimental biologists.
Level: Bachelor/ Master
Contact: Raphaëlle Luisier, [email protected]
Cultural bias in cross-lingual transfer
Description
Multilingual transformers have proven to be successful at cross-lingual transfer for multiple tasks in Natural Language Processing (NLP). In such a setup, a multilingual transformer model is fine-tuned on a given source language, for which annotations exist, and the resulting model is tested on a task in a target language by providing a few annotated examples or none. The standard approach to multilingual dataset creation for semantic tasks is translation from an existing (English) dataset, which introduces concerns regarding their suitability, because translations, especially to culturally diverse languages, may break certain relations, such as perceived causal relations in social behaviour, or moral values. Further information.
Goals
- Estimate the level of cross-lingual transfer of cultural biases on existing datasets
- Create datasets for the purpose of estimating cross-lingual transfer of cultural biases
- If relevant and if time permits you will develop ways to mitigate this effect
Research Program: AI for Everyone
Level: Bachelor/ Master
Contact: Lonneke van der Plas, [email protected]
Using large pretrained language models in speech recognition
Description
The aim of this project is to measure how large language models perform in their native cradle – automatic speech recognition. The student will analyze a standard speech dataset through one of the speech recognition models (e.g., publicly available or internal at Idiap), score the outputs with the language models, and combine the scores to refine the transcriptions. The result should be a verdict on the influence of the model size (are the big ones really needed?), a comparison of different models (is GPT better than the same-size LLaMA?) and an evaluation of the usefulness of retraining the language model, which is easy today even on a single GPU.
Goals
- Familiarize with speech recognition engines, available at Idiap
- Focus on application of language models in the speech recognition framework (including its use for re-scoring of N-best hypotheses)
- Explore large language models for its deployment in speech recognition
Research Program: Human-AI Teaming
Prerequisites: Python programming, Shell programming, basic knowledge of machine learning.
Level: Bachelor/ Master
Contact: Petr Motlicek, [email protected]
Understanding generalization in deep learning
Description
State-of-the-art approaches in machine learning are based on deep learning. The reasons for its success are however still poorly understood. Most existing work on the topic has focused on the effects of gradient-based optimization. Interestingly though, even randomly-initialized networks encode inductive biases that mirror some properties of real-world data. This project will contribute to the efforts made in our research group to understand the success of deep learning. This project will emphasize theoretical or practical contributions depending on the student’s interests. One of the objectives is to contribute to a high-quality publication co-authored with other members of our research group, and provide the student with training in rigorous research practices.
Goals
- Select datasets of interest and train various architectures on these
- Implement methods or use existing code from recent publications to understand the interplay of various properties of data vs. architectures
- Prepare results, visualizations, and analyses of experiments suitable for a scientific publication
Prerequisites: Solid programming background and experience with deep learning libraries (e.g. Pytorch)
References
- Loss Landscapes are All You Need (https://openreview.net/forum?id=QC10RmRbZy9)
- Predicting is not Understanding: Recognizing and Addressing Underspecification in Machine Learning (https://arxiv.org/abs/2207.02598)
Level: Bachelor/ Master
Contact: Damien Teney, [email protected]
Grading of inflammatory diseases affecting blood vessels in the eye
Description
Fluorescein angiography is the only clinical method to evaluate the function and integrity of the blood-retinal barrier. Using real hospital data, we aim to detect and grade inflammatory diseases affecting blood vessels in the eye. Through computer vision and machine learning approaches the student will identify novel biomarkers that could improve patient management and care. This project is a collaboration with a multi-centric medical team to identify promising new leads in the research of this field. Challenges include the segmentation and registration of (retinal) fundus angiography data, and the grading of diseased patients.
Goals
- To develop a system for the detection and grading of inflammatory eye diseases in medical images and video
- Validate the proposed approach and compare results to the state-of-the-art techniques
- Work together with clinical experts on improving the current understanding of the disease
Research Program: AI for Life
Prerequisites: Data analysis, machine learning, computer vision, programming (at least Python, and shell scripting languages for Linux)
Reference
- Tugal-Tutkun, I., Herbort, C. P., Khairallah, M., & Angiography Scoring for Uveitis Working Group (ASUWOG). (2010). Scoring of dual fluorescein and ICG inflammatory angiographic signs for the grading of posterior segment inflammation (dual fluorescein and ICG angiographic scoring system for uveitis). International ophthalmology, 30, 539-552.
Level: Master
Contact: Andre Anjos
Automatic identification of flight information from speech
Description
Current approaches toward automatic recognition of call-signs from speech combine conventional automatic speech recognition (i.e. speech-to-text) with entity recognition (i.e. text-to-call-sign) technologies. This project will develop a unified module (e.g. adaptation of well-known BERT models), which will allow a direct mapping of speech on the call-sign.
Goals
- Get familiar with a baseline of speech recognition module for Air Traffic Control (ATC)
- Get familiar with a baseline of concept-extractor module for ATC
- Apply an end-to-end framework to train both modules together and compare its performance with independently trained modules
Research Program: Human-AI Teaming
Prerequisites: Python programming, Shell programming, basic knowledge of machine learning
References
- Martin Kocour, et al, Boosting of contextual information in ASR for air-traffic call-sign recognition
- Zuluaga, et al: Contextual Semi-Supervised Learning: An Approach To Leverage Air-Surveillance and Untranscribed ATC Data in ASR Systems
- ATCO2 project
Level: Bachelor/ Master
Contact: Petr Motlicek, [email protected]
An open-source framework for the quantification of Urban Heat Islands in Switzerland
Description
Cities throughout the world are overheating in Summer, with adverse effects on the health of citizens. Due to the very mineral nature of the built environment, the scarcity of nature in cities, and the anthropogenic heat release in the streets, the temperature will continue to increase with climate change. With physically-based simulation tools we can predict hot spots, and evaluate scenarios for the mitigation of urban heat islands. While such tools exist, a framework based on open-data and easily accessible by researchers, practitioners and citizens is a must have to raise awareness and move towards efficient heat islands mitigation measures.
Goals
- Build an open-source framework in Python (or any other language except Matlab) to go from Swiss open-datasets to indicators related to the Urban Heat Island effect
- Introduce the Physiological Equivalent Temperature (PET) and the Universal Thermal Climate Index (UTCI) as indicators of Urban Comfort in our simulation tool
- Demonstrate the application of scenarios on 3 case-studies representative of the Swiss landscape, quantifying improvement measures
Research Program: Sustainable and Resilient Societies
Prerequisites: Basic energy balance and thermodynamics knowledge; Basic scripting or programming skills (no Matlab).
References
- Coccolo, Silvia, Jérôme Kämpf, Jean-Louis Scartezzini, and David Pearlmutter. ‘Outdoor Human Comfort and Thermal Stress: A Comprehensive Review on Models and Standards’. Urban Climate 18 (December 2016): 33–57. https://doi.org/10.1016/j.uclim.2016.08.004
- Coccolo, Silvia, David Pearlmutter, Jerome Kaempf, and Jean-Louis Scartezzini. ‘Thermal Comfort Maps to Estimate the Impact of Urban Greening on the Outdoor Human Comfort’. Urban Forestry & Urban Greening 35 (October 2018): 91–105. https://doi.org/10.1016/j.ufug.2018.08.007
Level: Master
Contact: Jérôme Kämpf, [email protected]
Swiss Alpine Lakes & Citizen Science
Description
The 2000Lakes initiative aims to catalog the microbial diversity in Swiss alpine lakes while developing a network of citizen science and stakeholders. We are looking for several motivated students interested in alpine science, data science, and human-centered research to develop a master thesis project or semester project on this topic. This project offers the possibility of contributing to an innovative approach to scientific research.
Goals
- Develop creative actions to inform and consolidate a network of stakeholders engaged with biodiversity in alpine lakes
- Develop computational tools (using data visualization, social media, media archives) to support interaction with citizens and stakeholders
- Participate in fieldwork and publications in the field of citizen science
Research Program: AI for Everyone
Prerequisites: Interest and/or experience in one or more of these areas: social media, citizen science, community organizing, data visualization, data analysis, machine learning
Level: Bachelor/ Master
Contact: Daniel Gatica-Perez, [email protected]
Tensor trains for human-guided optimization in robotics applications
Description
This project extends Tensor Train for Global Optimization (TTGO) to a human-guided learning strategy. Learning and optimization problems in robotics are characterized by two types of variables: task parameters representing the situation that the robot encounters (typically related to environment variables such as locations of objects, users or obstacles); and decision variables related to actions that the robot takes (typically related to a controller acting within a given time window, or the use of basis functions to describe trajectories in control or state spaces). In TTGO, the density function is modeled offline using a tensor train (TT) that learns the structure between the task parameters and the decision variables, and then allows conditional sampling over the task parameters with priority for higher-density regions. Further information.
Goals
- The goal is to test whether the original autonomous learning strategy of TT-Cross can be extended to a human-guided learning strategy, by letting the user sporadically specify task parameters or decision variables within the iterative process. The first case can be used to provide a scaffolding mechanism for robot skill acquisition. The second case can be used for the robot to ask for help in specific situations.
Research Program: Human-AI Teaming
Prerequisites: Linear algebra, optimization, programming in Python
Reference
- Shetty, S., Lembono, T., Löw, T. and Calinon, S. (2023). Tensor Train for Global Optimization Problems in Robotics. arXiv:2206.05077.
https://sites.google.com/view/ttgo
Level: Bachelor/ Master
Contact: Sylvain Calinon, [email protected]
Multi-spectral image unmixing for rapid and automated image annotation
Description
Object segmentation and identification methods often rely on the availability of large annotated image libraries. While such libraries are widely available for every-day image scenes, many applications in industry, science and medicine lack similar data because of their unique and specialized nature. The student will implement and characterize the potential of imaging scenes using a multi-spectral (colored) illumination patterns to facilitate object annotation in complex scenes. The project will involve the use of a custom hardware imaging setup consisting of a digital camera with triggered multi-color light sources, collecting images of objects, and implementing computational imaging algorithms for spectral unmixing and image segmentation.
Goals
- Implement a multi-spectral image acquisition protocol using triggered LEDs of various wavelengths to acquire images of objects in a lab setting
- Implement a spectral unmixing algorithm to segment objects in images
- Depending on progress, deployment deploy method in a light microscope for imaging biological samples
Research Program: AI for Life
Prerequisites: Signal processing/image processing, Introduction to machine learning, Python programming.
References
- Jaques, E. Pignat, S. Calinon and M. Liebling, “Temporal Super-Resolution Microscopy Using a Hue-Encoded Shutter,” Biomedical Optics Express, 10(09):4727-4741, 2019
- Jaques, L. Bapst-Wicht, D.F. Schorderet and M. Liebling, “Multi-Spectral Widefield Microscopy of the Beating Heart through Post-Acquisition Synchronization and Unmixing,” IEEE International Symposium on Biomedical Imaging (ISBI 2019), pp. 1382-1385, 2019
Level: Master
Contact: Michael Liebling, [email protected]
Audiovisual person recognition
Description
Audiovisual person identification systems combine two biometric modalities that lead to very good results, as shown in Idiap’s submission to NIST SRE2019. The student will be able to use most of Idiap’s scripts, mainly the audio-related part. Fusion scripts for combining audio and visual systems can also be shared. One of two approaches can be considered, either to develop these systems separately and then experiment with fusion, or attempt to make a single person identification system taking both audio and visual embedding representations as input.
Research Program: Sustainable and Resilient Societies
Prerequisites: Python programming, Shell programming, basic knowledge of machine learning.
References
- NIST SRE 2019
- The 2019 NIST Audio-Visual Speaker Recognition Evaluation
Level: Bachelor/ Master
Contact: Petr Motlicek, [email protected]
Understanding the robustness of machine learning models on underspecified tasks
Description
The performance of deep learning models can quickly degrade when used on test data beyond their training distribution. In recent work [1], we have observed intriguing patterns in the “in-distribution” vs. “out-of-distribution” performance of various models. In particular, there sometimes exists a tradeoff between the two, which evolves during its training and fine-tuning. It is not clear however what impact the pre-training and fine-tuning stages have. This project will contribute to the efforts to understand this topic. One of the objectives is to concretely contribute to a high-quality publication co-authored with other members of our research group.
Goals
- Select datasets of interest and train models with existing code
- Examine the performance of various models under various hyper-parameters, numbers of epochs, pre-training/fine-tuning options, etc. Develop model selection strategies to identify robust models
- Prepare results, visualizations, and analyses of experiments suitable for a scientific publication
Prerequisites: Solid programming background and experience with deep learning libraries (e.g. Pytorch)
References
- ID and OOD Performance Are Sometimes Inversely Correlated on Real-world Datasets (https://arxiv.org/abs/2209.00613)
- The Evolution of OOD Robustness Throughout Fine-Tuning (https://arxiv.org/abs/2106.15831)
Level: Bachelor/ Master
Contact: Damien Teney, [email protected]
Social media and crowdsourcing for social good
Description
The student will contribute to a multidisciplinary initiative for the use of social media and mobile crowdsourcing for social good. Several projects are available. Students will be working with social computing researchers working with academics in other countries, both in Europe and the Majority World.
Goals
- Social media analytics
- Visualization of social and crowdsourced data
- Smartphone apps for mobile crowdsourcing
Research Program: AI for Everyone
Prerequisites: Interest and/or experience in one or more of these areas: data analysis, machine learning, data visualization, phone apps, social media, natural language processing, computer vision
Level: Bachelor/ Master
Contact: Daniel Gatica-Perez, [email protected]
Punctuation restoration on automatic speech recognition output
Description
The goal of the project is to train a model to post-process automatic speech recognition (ASR) output and add punctuation marks (and capitalizations for the next level of difficulty). This will improve readability of an ASR output and make it potentially more useful for other down-stream tasks, such as dialogue systems and language analysis.
Goals
- Get acquainted with the problem, available data, success metrics, machine learning frameworks
- Program a simpler system predicting just sentence ends/full stops; Improve and make predictions for other punctuation marks; For extra difficulty learn to predict capital letters
- Test and evaluate on a couple of languages, real scenarios
Research Program: Human-AI Teaming
Prerequisites: Python programming, Shell programming, basic knowledge of machine learning.
References
- Yi, et al. Adversarial Transfer Learning for Punctuation Restoration
- Pais, et al., Capitalization and punctuation restoration: a survey
- Nanchen, et al. EMPIRICAL EVALUATION AND COMBINATION OF PUNCTUATION PREDICTION MODELS APPLIED TO BROADCAST NEWS
Level: Bachelor/ Master
Contact: Petr Motlicek, [email protected]
Ergodic control for robot exploration
Description
Ergodic control can be exploited in a range of robotics problems requiring the exploration of regions of interest, e.g. when the available sensing information is not accurate enough for a standard controller, but can guide the robot towards promising areas. In a collaborative task, it can also be used when the operator’s input is not accurate enough to fully reproduce the task, which then requires the robot to explore around the requested input (e.g., a point of interest selected by the operator). For picking and insertion, it can be applied to move around the picking/insertion point, thereby facilitating the prehension/insertion. It can also be employed for active sensing and localization (either detected autonomously, or with help by the operator). Further information.
Goals
- To study the pros and cons of Spectral Multiscale Coverage and Heat Equation Driven Area Coverage to solve robot manipulation problems
Research Program: Human-AI Teaming
Prerequisites: Control theory, signal processing, programming in Python, C++ or Matlab/Octave
References
- Mathew and I. Mezic (2009). Spectral multiscale coverage: A uniform coverage algorithm for mobile sensor networks. In Proc. IEEE Conf. on Decision and Control.
- Ivić, B. Crnković, and I. Mezić (2007). Ergodicity-based cooperative multiagent area coverage via a potential field. IEEE Trans. on Cybernetics.
Level: Bachelor/ Master
Contact: Sylvain Calinon, [email protected]
Clinically interpretable computer aided diagnostic tool using multi-source medical data
Description
Fighting many rare diseases would benefit from automated image analysis tools to improve the available understanding about them. One of such rare diseases is fibromuscular dysplasia (FMD), which is an under-recognized disease of the blood vessels. Challenges include the segmentation of the renal artery from larger 3D volumes, and the classification of FMD from healthy patients. The main tasks of the project include: Literature review, Medical image analysis i.e., segmentation of 3D tubular structures in real computed tomography images, deep-learning disease detection, and proposing novel approaches to improve the understanding of this disease.
Goals
- Improve characterization of the renal artery in computed tomography scans
- Build an interpretable machine learning system using clinical imaging data for FMD
- Develop – together with clinical experts – a computer aided diagnostic tool for this disease
Research Program: AI for Life
Prerequisites: Data analysis, machine learning, computer vision, programming (at least Python, and shell scripting languages for Linux)
Reference
- Bruno, R. M., Mischak, H., & Persu, A. (2020). Multi-omics applied to fibromuscular dysplasia: first steps on a new research avenue. Cardiovascular research, 116(1), 4-5.
Level: Bachelor/ Master
Contact: Andre Anjos, [email protected]
Assessing radiomics feature stability and discriminative power in 3D and 4D medical data
Description
Radiomic features obtained from medical images and video can objectively quantify relevant information present in clinical studies. However, recent studies have shown that some of these features can be unstable and redundant, as features can be sensitive to variations of acquisition details. Therefore, reproducibility and discriminative power cannot be treated in isolation boosting the identification of the best features that show a show a higher tolerability towards those influences. Challenges include: Determine the stability of radiomics features against parameter variations during acquisition, as well as across different time points between patient studies.
Goals
- Implementation and analysis of radiomic features extracted from 3D and 4D medical data
- Identifying the most relevant features according to their variability and stability in different radiological tasks
- Proposing novel approaches to mitigate biases and limitations of these features in a real clinical scenario
Research Program: AI for Life
Prerequisites: Data analysis, machine learning, computer vision, programming (at least Python, and shell scripting languages for Linux)
Reference
- Jimenez-del-Toro, O., Aberle, C., Bach, M., Schaer, R., Obmann, M. M., Flouris, K., … & Depeursinge, A. (2021). The discriminative power and stability of radiomics features with computed tomography variations: task-based analysis in an anthropomorphic 3D-printed CT phantom. Investigative radiology, 56(12), 820-825.
Level: Bachelor/ Master
Contact: Andre Anjos, [email protected]
Automatic named entity recognition from speech
Description
The project will improve detection and recognition of named entities (e.g. names, places, locations) automatically from speech. Currently, two independent technologies are used, namely automatic speech recognition (i.e. usually evaluated to minimise a word error rate) and named entity recogniser. The goal of this project is to efficiently combine these two modules, while leveraging state-of-the-art open source tools such as SpeechBrain or BERT.
Goals
- Get familiarized with a baseline of speech recognition module developed in ROXANNE
- Get familiarized with a baseline entity extractor module
- Apply an end-to-end framework to train both modules together and compare its performance with independently trained modules
Research Program: Human-AI Teaming
Prerequisites: Python programming, Shell programming, basic knowledge of machine learning.
References
- Mael Fabien, et al, ROXANNE Research Platform: Automate criminal investigations
- Mael Fabien, et al.,BertAA: BERT fine-tuning for Authorship Attribution
- ROXANNE project website
Level: Bachelor/ Master
Contact: Petr Motlicek, [email protected]
A human-centered approach to understand local news consumption
Description
The project aims to design and implement a framework to study the consumption of local news in the European multicultural context. The project will include a combination of research methods for experimental design and data analysis, and will be done in the context of the AI4Media European project, a European Excellence Center for Media, Society, and Democracy.
Goals The specific goals of the project include
- literature review
- identification of news sources
- mixed-method experimental design
- experiments and data analysis
- and writing
Research Program: AI for Everyone
Prerequisites: Interest and/or experience in one or more of these areas: data analysis, machine learning, data visualization, phone apps, social media, natural language processing, computer vision
Level: Bachelor/ Master
Contact: Daniel Gatica-Perez, [email protected]
Data-driven identification of prognostic tumor subpopulations from single-cell RNA sequencing data
Description
This project is part of a larger one aiming to integrate single-cell sequencing data with imaging data in order to develop accurate machine learning methods to identify tumor subpopulations. It involves a close collaboration with the Department of oncology UNIL CHUV header by Prof. Olivier Michielin and the Novartis Institute for Biomedical Research. Accumulating evidence shows aberrant mRNA metabolism in cancer however relatively little is known about the impact of genetic mutation on mRNA metabolism in cancers and how this confers resistance to therapy.
Goals
- In this project the student will develop and implement bioinformatics pipelines to study alternative splicing and polyadenylation from single-cell transcriptome of Braf inhibitors resistant melanoma. This will then serve to test whether combining measurements from gene and alternative 3′ UTR expression enable the identification of subtle subpopulations that confer drug resistances.
Research Program: AI for Life
Prerequisites: Candidates should have strong mathematical and computational skills. Candidates should be familiar with Python/R, and with the Linux environment. Experience in sequencing data and machine learning is an asset. Candidates do not necessarily have to have a biological background but should have a strong desire to directly work with experimental biologists.
Level: Bachelor/ Master
Contact: Raphaëlle Luisier, [email protected]
Development of epigenetic biomarkers for chronic pain stratification
Description
Chronic pain is a major health care problem that affects millions of people worldwide. It has been demonstrated that complex interactions between biological, psychological, environmental, and social factors may influence pain chronicization. Therefore, epigenetic factors may be the trigger to explain the transition from acute to chronic pain and chronic pain maintenance. However, little is known about the influence of these biopsychosocial factors on epigenetic modifications in a population of chronic musculoskeletal pain patients consecutively to an orthopedic trauma. This project will analyze the whole genome methylation levels in a population of chronic pain patients and healthy controls through the prism of specific biological (age, medication) and psychological (anxiety/depression) factors.
Goals
- This biological project will undertake bioinformatic analyses of methylation sites on the whole genome to identify specific genes that may be involved in the transition from acute to chronic pain. This project will be in collaboration with the medical research group at the Clinique romande de readaptation (CRR, Betrand Leger), where the student is expected to spend 20% of their time
Research Program: AI for Life
Prerequisites: Candidates should have strong mathematical and computational skills. Candidates should be familiar with Python/R, and with the Linux environment. Experience in sequencing data and machine learning is an asset. Candidates do not necessarily have to have a biological background but should have a strong desire to directly work with experimental biologists.
Level: Bachelor/ Master
Contact: Raphaëlle Luisier, [email protected]
Compartment-specific mRNA metabolism in MNs and ACs in ALS pathogenesis
Description
This project is part of a larger one aiming to study how molecular biology shapes cellular morphology at early stage of Amyotrophic Lateral Sclerosis (ALS) by integrating longitudinal cellular imaging with genomic data. It involves a close collaboration with the experimental laboratory of Prof. Rickie Patani, Francis Crick Institute/UCL. We recently uncovered cytoplasmic accumulation of aberrant intron retaining transcripts (IRTs) as the earliest detectable molecular phenotype in ALS 1–4. The mechanisms that control RNA binding protein mislocalization, the molecular hallmark of ALS, have yet to be elucidated and it remains unknown whether the early aCIRT relates to protein mislocalization, ER stress, mitochondrial depolarisation, oxidative stress, synaptic loss and cell death.
Goals
- to study the temporal and spatial dynamics of intronic and 3′ UTR sequences in developing MNs and ACs derived from ALS-mutant and control iPSC cell lines using time-resolved RNA-sequencing data from nuclear and cytoplasmic fractions
- to characterise the sequence features of cytoplasmic and nuclear cytoplasmic IRTs and 3′ UTR
- to develop an mRNA subcellular localisation model using machine learning methods
Research Program: AI for Life
Prerequisites: Candidates should have strong mathematical and computational skills. Candidates should be familiar with Python/R, and with the Linux environment. Experience in sequencing data and machine learning is an asset. Candidates do not necessarily have to have a biological background but should have a strong desire to directly work with experimental biologists.
References
- Luisier, R. et al. Intron retention and nuclear loss of SFPQ are molecular hallmarks of ALS. Nat. Commun. 9, 2010 (2018).
- Tyzack, G. E. et al. Widespread FUS mislocalization is a molecular hallmark of amyotrophic lateral sclerosis. Brain 142, 2572–2580 (2019).
- Hall, C. E. et al. Progressive Motor Neuron Pathology and the Role of Astrocytes in a Human Stem Cell Model of VCP-Related ALS. Cell Rep. 19, 1739–1749 (2017).
- Tyzack, G. E. et al. Aberrant cytoplasmic intron retention is a blueprint for RNA binding protein mislocalization in VCP-related amyotrophic lateral sclerosis. Brain vol. 144 1985–1993 Preprint at https://doi.org/1093/brain/awab078 (2021).
Level: Bachelor/ Master
Contact: Raphaëlle Luisier, [email protected]
Speaker identification enhanced by the social network analyser
Description
The project will build, test and combine technologies associated with the ROXANNE platform by leveraging open source tools (e.g. SpeechBrain, and SocNetV) to demonstrate their strength in an improved identification of persons. The project definition can be adapted toward application of other modalities (e.g. estimating authorship attribution from text, or detection of person using face identification technology).
Goals
- Build a baseline automatic speaker identification engine, either using an open source tool (such as SpeechBrain, or the one available at Idiap), and test it on target (simulated) data related to lawful investigation
- Build a baseline graph/network analysis tool with basic functionalities such as centrality or community detection (i.e. also many open source tools can be exploited) and test it on the simulated data
- Study a combination of information extracted by speech and network analysis technologies to eventually improve the person identification
Research Program: Sustainable and Resilient Societies
Prerequisites: Python programming, Shell programming, basic knowledge of machine learning.
References
- Mael Fabien, et al, ROXANNE Research Platform: Automate criminal investigations,
- ROXANNE project website
Level: Bachelor/ Master
Contact: Petr Motlicek, [email protected]
Speech/Music Classification
Description
Classifying sound into speech, music and possibly noise is important for systems based on statistical modeling. Statistical models are usually trained on a large database of input signals containing various sounds. In both the training process and the testing process it is advantageous to exclude segments containing non-speech sounds to improve the accuracy of the model. This project will develop a classifier discriminating speech from music and potentially also from noise. You will first analyze existing approaches to speech/music classification and evaluate their efficiency and accuracy using conventional metrics for binary classification. You will then propose your own classifier or improve an existing one.
Goals
- Familiarize with voice activity detectors, or existing speech/music detectors available publicly or at Idiap
- Develop a new speech/music classifier
- Evaluate the new technology with baseline on well-established data
Research Program: AI for Life
Prerequisites: Python programming, Shell programming, basic knowledge of machine learning
References
- Banriskhem K.Khonglah: Speech / music classification using speech-specific features, Digital Signal Processing, Volume 48, January 2016, Pages 71-83
- Mrinmoy Bhattacharjee: Time-Frequency Audio Features for Speech-Music Classification
- Toni Hirvonen: Speech/Music Classification of Short Audio Segments, 2014 IEEE International Symposium on Multimedia
Level: Bachelor/ Master
Contact: Petr Motlicek, [email protected]
Error correction in speech recognition using large pre-trained language models
Description
The aim of this work will be to find out if it is possible to use these language models for the correction of errors in the transcription of spoken speech. The student will run some standard speech set through one or more publicly available speech transcription models and then investigate how the language models are able to correct errors: Does the overall error rate matter? Are there any classes of errors that are better fixed? Is it better to use a traditional language model (e.g. LLaMA) or a conversational one (e.g. Alpaka)?
Goals
- Familiarize with speech recognition engines, available at Idiap
- Focus on application of language models in the speech recognition framework (including its use for re-scoring of N-best hypotheses)
- Explore large language models for its deployment to post-process speech recognition output
Research Program: Human-AI Teaming
Prerequisites: Python programming, Shell programming, basic knowledge of machine learning
Level: Bachelor/ Master
Contact: Petr Motlicek, [email protected]
Automatic speech recognition of air-traffic communication using grammar
Description
Current state-of-the-art speech-to-text systems (i.e. automatic speech recognition engines (ASR)) applied to air-traffic control exploit statistical language models which require large amounts of textual data for training. Nevertheless, the Air Traffic Controller Officers (ATCOs) are required to strictly follow the phraseology (i.e. standardised International Civil Aviation Organization, ICAO) and thus context-free grammar (CFG) can be used to model sequences of words generated by ATCOs. The goal of this project is to explore new ways how traditional concepts of statistical language modeling can be enriched by standardised phraseology (i.e. modeled by CFG-based language modeling).
Goals
- Develop a baseline automatic speech recognition engine in Kaldi framework suited for air-traffic controllers
- Explore use of CFG-based language model in ASR allowing to model sequences of words (i.e. replacing the statistical language model or enriching them)
- Compare the performance of new language model on ASR tasks
Research Program: Human-AI Teaming
Prerequisites: Python programming, Shell programming, basic knowledge of machine learning
References:
- Oualil, et al, A Context-Aware Speech Recognition And Understanding System For Air Traffic Control Domain
- Oualil, et al, Real-Time Integration of Dynamic Context Information for Improving Automatic Speech Recognition
Level: Bachelor/ Master
Contact: Petr Motlicek, [email protected]
Pathological speech detection in adverse environments
Description
Various conditions of brain damage may disrupt the speech production mechanism, resulting in motor speech disorders that encapsulate altered speech production in different dimensions. To diagnose motor speech disorders, we have developed automatic speech processing approaches. Such approaches however can fail to cope with realistic clinical constraints, i.e., the presence of noise and reverberation when recording speech in clinical settings. This project will contribute to the efforts made in our group to understand the performance of state-of-the-art approaches in adverse environments and develop appropriate approaches targeting such scenarios.
Goals
- Set up datasets of interest
- Implement existing approaches and/or get familiar with existing implementations
- Examine the performance of various approaches in adverse environments
- If relevant and time permits, develop novel approaches targeting adverse scenarios
Research Program: AI for Life
Prerequisites
Python programming; basic knowledge of machine learning
Level: Bachelor/Master
Contact: Ina Kodrasi, [email protected]
Pathological speech enhancement
Description
Speech signals recorded in an enclosed space by microphones placed at a distance from the source are often corrupted by reverberation and background noise, which degrade speech quality, impair speech intelligibility, and decrease the performance of automatic speech recognition systems. Speech enhancement approaches to mitigate these effects have been devised for neurotypical speakers, i.e., speakers without any speech impairments. However, pathological conditions such as hearing loss, head and neck cancers, or neurological disorders, disrupt the speech production mechanism, resulting in speech impairments across different dimensions. This project will contribute to our efforts to understand the performance of state-of-the-art approaches for pathological signals and develop appropriate approaches targeting pathological speech.
Goals
- Set up datasets of interest
- Implement existing approaches and/or get familiar with existing implementations
- Examine the performance of various approaches for pathological speech signals
- If relevant and time permits, develop novel approaches targeting pathological speech
Research Program: AI for Life
Prerequisites
Python programming; basic knowledge of machine learning
Level: Bachelor/Master
Contact: Ina Kodrasi, [email protected]
Parametric Gaze Following
Description
The gaze following task in computer vision is defined as the prediction of the 2D coordinates where a person in an image is looking. Previous research efforts cast the problem as a heatmap prediction and consider the point of maximum intensity to be the predicted gaze point. This formulation has the benefit of enabling the model to highlight different potential gaze targets when the scene does not contain enough information to be conclusive. However, aside from the argmax, it is relatively difficult to leverage such heatmaps to automatically retrieve more information about the distribution they represent (e.g. the different modes, the weight and variance of each mode, etc.). The goal of this project is to explore a different formulation of the gaze following task where we predict a parametric probability distribution instead of heatmap pixels. Preliminary experiments in this direction have shown promising results.
Goals
- Investigate ideas to cast gaze-following as the prediction of a parametric probability distribution (e.g. Mixture of Gaussians) instead of a heatmap
- Propose new performance metrics capturing more information about the distribution compared to point-based metrics
Research Program: AI for Life
Prerequisites
The project will involve programming in Python using the Pytorch library. Knowledge of deep learning will be required, ideally from previous courses and projects
References
- Tafasca, Samy, Anshul Gupta, and Jean-Marc Odobez. “ChildPlay: A New Benchmark for Understanding Children’s Gaze Behaviour.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023.
- Chong, Eunji, et al. “Detecting attended visual targets in video.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.
Type: Master’s Thesis (Master’s Project)
Level: Master
Contact: Dr. Jean-Marc Odobez, [email protected]
From Gaze Following to Joint Attention
Description
Gaze is an important marker for non-verbal communication that is indicative of a person’s visual attention. It is also a proxy measure of cognition and can be used to evaluate a subject’s state of mind, intentions, and preferences among other things. As such, it has attracted high interest over the years from different research communities ranging from psychology to neuroscience.
Here, we are specifically interested in the gaze following task, defined as the prediction of the 2D pixel location where a person in the image is looking. It may also involve predicting a binary flag that indicates whether the subject is looking inside or outside the image frame. The first step of this project is to train and evaluate a state-of-the-art transformer-based gaze following model on a new and challenging dataset. The idea is to evaluate not only gaze following performance using standard metrics but also joint attention between people using a post-processing approach based on the predicted gaze points. In the second stage, we will look to extend the network architecture to predict joint attention in an end-to-end manner. The resulting model will serve to pseudo-annotate a large-scale video dataset to highlight potential segments of joint attention for sampling and further manual annotation.
This work is part of a collaboration with the Language Acquisition and Diversity Lab of the University of Zurich, headed by Prof. Suzanne Stoll.
Goals
- Train a gaze-following model on a new dataset and evaluate gaze-following and joint attention performance
- Extend the architecture to predict both the gaze point and joint attention simultaneously and compare with the baseline
Research Program: AI for Life
Prerequisites
The project will involve programming in Python using the Pytorch library. Knowledge of deep learning will be required, ideally from previous courses and projects.
References
- Tafasca, Samy, Anshul Gupta, and Jean-Marc Odobez. “Sharingan: A Transformer-based Architecture for Gaze Following.” arXiv preprint arXiv:2310.00816 (2023)
- Fan, Lifeng, et al. “Inferring shared attention in social scene videos.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2018
Type: Semester Project (Research Project)
Level: Bachelor or Master
Contact: Dr. Jean-Marc Odobez, [email protected]
Person Head Detection
Description
Numerous models are readily available for face detection, but there’s a noticeable gap when it comes to detecting the entire head of a person, which is crucial for certain applications. While many applications focus solely on identifying faces, some rely on information from the entire head. Examples of such applications include head pose estimation and gaze tracking, which is the specific focus of this project.
The primary objective is to curate a collection of publicly accessible datasets containing annotations for heads and use them to train a cutting-edge object detection model. This model will be designed to accurately locate people’s heads within images. Additionally, we will explore various augmentation techniques to enhance its performance. It’s important to highlight that the resulting model is expected to cater to a broad spectrum of users across diverse applications.
Goals
- Compile multiple available datasets containing head annotations and convert their labels to a unified format
- Train and evaluate an object detector (e.g. Yolov8, DETR), and experiment with data augmentation strategies to maximize performance (especially for extreme head poses)
- Package the model and inference pipeline in a clean and modular codebase that makes it easy for the end-user to run
Research Program: AI for Life
Prerequisites
Proficiency in Python programming (including the standard scientific libraries, e.g. numpy, pandas) is expected. Knowledge of deep learning and the PyTorch framework is desired, but not required.
References
- Shao, Shuai, et al. “Crowdhuman: A benchmark for detecting human in a crowd.” arXiv preprint arXiv:1805.00123 (2018).
- Terven, Juan, and Diana Cordova-Esparza. “A comprehensive review of YOLO: From YOLOv1 to YOLOv8 and beyond.” arXiv preprint arXiv:2304.00501 (2023).
- Carion, Nicolas, et al. “End-to-end object detection with transformers.” European conference on computer vision. Cham: Springer International Publishing, 2020.
Type: Semester Project (Research Project)
Level: Bachelor or Master
Contact: Dr. Jean-Marc Odobez, [email protected]
Research Spotlight: Crafting An Interactive Webpage Template for Showcasing Science
Description
In today’s world, simply getting your research papers published in journals or conferences is no longer sufficient for a scientist. The emphasis has shifted towards promoting and sharing one’s work via social media, websites, and interactive demonstrations. With this in mind, our project aims to create a versatile webpage template tailored for showcasing research papers effectively. This template will prioritize aesthetics, organized sections, and the ability to incorporate various types of content, such as text, images, videos, and interactive elements like sliders for visualizing how results change with specific parameters. We will investigate the adaptation of existing templates and the development of novel components designed for interactive model demonstrations.
Goals
- Develop a generic feature-rich webpage template for showcasing research papers
- Experiment with tools to demonstrate machine learning models interactively (e.g. Streamlit, Gradio) and evaluate their potential integration in the template
Research Program: AI for Life
Prerequisites
Proficiency in web development is required.
References
- MultiMAE | Multi-modal Multi-task Masked Autoencoders – https://multimae.epfl.ch/
- Gradio – https://www.gradio.app/
Type: Semester Project (Research Project)
Level: Bachelor or Master
Contact: Dr. Jean-Marc Odobez, [email protected]
Spatio-temporal Modeling of Human Behavior in Videos
Description
In this project, the primary objective is to explore various spatio-temporal models to derive effective video representations for tasks related to facial behavior, such as head gesture and facial expression recognition. These tasks necessitate rich spatio-temporal representations, yet current methods mostly rely on hand-crafted features. Thus, in this project, the goal is to utilize video encoders in an end-to-end manner that extract effective spatio-temporal features as input to the facial-related task heads.
Furthermore, various facial behavior tasks can be jointly learned through weakly-supervised learning. Thus, in this project, there is potential to develop a method that trains these tasks jointly using pseudo-annotations extracted for the videos.
Goals
- Extract a facial tracking tool to extract facial clips
- Implement spatio-temporal models and fine-tune
- Evaluate the models on human behavior benchmarks, such as CelebV-HQ, CMU-MOSEI, MEAD, CCDb-HG, etc
Research Program: AI for Life
Prerequisites
- Proficiency in the Python programming language
- Familiarity with deep learning and the PyTorch library
- Knowledge of computer vision would be advantageous
- A passion for modeling real-world problems and understanding human behavior!
References
- Head gesture recognition demo
- Video representation for human behavior: Marlin: https://arxiv.org/abs/2211.06627
Level: semester research project (master), master project (PDM)
Contact: Dr. Jean-Marc Odobez, [email protected]
Scaling Pre-training for Gaze Following
Description
Understanding where a person is looking, or gaze following, is vital for a variety of applications including autonomous driving, human-computer interaction and medical diagnosis. Existing models for gaze following are typically trained in a supervised manner on small, manually annotated datasets. The goal of the project is to perform pre-training on large video datasets by leveraging pseudo annotations from strong gaze following models. We also aim to investigate incorporating weak supervision from auxiliary labels to enhance the learned representations.
Goals
- After curating diverse video datasets, generate pseudo-annotations for the curated dataset by leveraging strong gaze following models such as [Tafasca et al., 2023]
- Use this dataset to pre-train gaze following models on the pseudo annotations
- Fine-tune and evaluate the pre-trained gaze following models on annotated video datasets
Research Program: AI for Life
Prerequisites
The project will involve programming in Python using the Pytorch library. Knowledge of deep learning will be required, ideally from previous courses.
References
- [Tafasca et al, 2023] Samy Tafasca, Anshul Gupta and Jean-Marc Odobez. (2023). ChildPlay: A New Benchmark for Understanding Children’s Gaze Behaviour. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).
Level: Master: semester or master project. Can be done by multiple students
Contact: Dr. Jean-Marc Odobez, [email protected]
Cognitive architecture for assistive robots
Description:
Controlling a robot for human interactions is a challenge. For decades, multiple research groups developed cognitive architecture to allow robots to achieve higher-level interaction with people. In this project, you will be working on the Lio robot and integrate it to the DIARC architecture from the HRILab in Tufts.
Goals:
- Getting familiar with DIARC
- Creating a Lio agent profile
- Connecting DIARC to Ros
- Developing human-robot interaction demos
Research Program: Human-AI Teaming
Prerequisites:
Good command of Java/Python and basics of Linux. Experience in robotics/ROS is a plus.
Level: Bachelor/ Master
Contact: Emmanuel Senft, [email protected], Jean-Marc Odobez, [email protected]
Text flourishing for a robot writer: a learning and optimization approach
Description
This project aims to generate trajectories for a robot to embellish text in an automatic manner.
Goals
An optimal control approach based on iterative linear quadratic regulator (iLQR) will be investigated for trajectory optimization. First, the problem will be approached by designing an algorithm for automatically placing a set of ellipses above and below the words to be flourished, by considering the empty spaces available based on the surrounding texts. A path optimization algorithm will then be created to generate movements by using the ellipses as guides (possibly formulated as virtual mass and gravitational forces). The objectives will be designed by transforming aesthetic guidelines for artists into a set of cost functions that can be used in optimal control. The project will be implemented with a 6-axis UFactory Lite-6 robot available at Idiap.
Research Program: Human-AI Teaming
Prerequisites:
Linear algebra, programming in Python or C++
References:
- Calinon, S. (2023). Learning and Optimization in Robotics – Lecture notes
- Robotics codes from scratch (RCFS)
Level: Bachelor/Master (semester project or PDM)
Contact: Sylvain Calinon, [email protected]
Ergodic drawing for a robot manipulator
Description
This project aims to generate trajectories for a drawing robot by using the principle of ergodicity.
Goals
An optimal control approach combining an iterative linear quadratic regulator (iLQR) and a cost on ergodicity will be investigated for trajectory optimization. The project will also investigate the use of electrostatic halftoning or repulsive curves as initialization process.
The project will be implemented with a 6-axis UFactory Lite-6 robot available at Idiap.
Research Program: Human-AI Teaming
Prerequisites:
Linear algebra, programming in Python or C++
References:
- Löw, T., Maceiras, J. and Calinon, S. (2022). drozBot: Using Ergodic Control to Draw Portraits. IEEE Robotics and Automation Letters (RA-L), 7:4, 11728-34
- Calinon, S. (2023). Learning and Optimization in Robotics – Lecture notes (Section 9 on ergodic control)
- Robotics codes from scratch (RCFS)
- drozBot, the portraitist robot
Level: Bachelor/Master (semester project or PDM)
Contact: Sylvain Calinon, [email protected]
Deep learning for a portraitist robot application
Description
Text-driven generation of caricatures in the form of drawing strokes This project aims to explore the use of generative deep learning techniques based on image diffusion for a robot portrait drawing application.
Goals
- Most of the generative deep learning techniques use images as formats, but a few explored the use of vector graphics as output format, guided by text prompts for the rendering. This project will investigate the use and comparison of these techniques in the context of a robot portrait drawing application. The project will be implemented with a 6-axis UFactory Lite-6 robot available at Idiap.
Research Program: Human-AI Teaming
Prerequisites:
Deep learning, programming in Python or C++
References:
- SVGDreamer: Text Guided SVG Generation with Diffusion Model
- DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models
- VectorFusion: Text-to-SVG by Abstracting Pixel-Based Diffusion Models
- SVG Differentiable Rendering: Generating vector graphics using neural networks
Level: Bachelor/Master (semester project or PDM)
Contact: Sylvain Calinon, [email protected]
Human body tracking with signed distance fields
Description
Signed distance fields (SDFs) are popular implicit shape representations in robotics. Most often, SDFs are used to represent rigid objects. However, they can also be used to represent general kinematic chains, such as articulated objects, robots, or humans. SDFs provide a continuous and differentiable representation that can easily be combined with learning, control, and optimization techniques. This project aims to explore the SDF representation of the human body based on state-of-the-art detection, tracking, and skeleton extraction techniques. The developed SDF representation can be used for human-robot interaction or transferring manipulation skills from humans to robots.
Goals
The human skeleton can be detected and tracked through images or videos using pre-trained vision models, and SDFs can be reconstructed by leveraging the SMPL-X model, a realistic 3D model for the human body based on skinning and blend shapes. This project proposes to utilize these techniques to build the SDF for the human body and then apply it to robot manipulation tasks.
Research Program: Human-AI Teaming
Prerequisites:
Machine learning, computer vision, programming in Python or C++
References
- Li, Y., Zhang, Y., Razmjoo, A. and Calinon, S. (2024). Representing Robot Geometry as Distance Fields: Applications to Whole-body Manipulation. In Proc. IEEE Intl Conf. on Robotics and Automation (ICRA), pp. 15351-15357
- Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, Michael J. Black (2019). Expressive Body Capture: 3D Hands, Face, and Body from a Single Image. In CVPR
Level: Bachelor/Master (semester project or PDM)
Contact: Sylvain Calinon, [email protected]