Ongoing and Concluded Student Projects ‒ EMPLUS ‐ EPFL

Ongoing Projects Fall 2024

Creative Exploration of Natural Language Processing for the Design of a Virtual Reality Installation

Masters Thesis Project (DH/IC)

Supervisor: Dhruva Gowda Storz and Prof. Sarah Kenderdine

Project description:

The Laboratory for Experimental Museology and the Visualization Research Center in Hong Kong have acquired a collection of 100 digitized stereographic photographs, along with a corpus of text describing each one of them in the context of the author’s journey through China in the wake of the boxer rebellion. Our lab aims to design an interactive virtual reality installation for exploring this dataset, through an investigation of how new narrative threads or interaction modalities can be created through linking the text corpus with images through natural language processing techniques.

This interdisciplinary master thesis is a vital part of our aims with this dataset, and involves an unbiased creative exploration of various NLP techniques of various levels of complexity (TFIDF, word2vec, NER, CLIP, LLMs, RAG, etc) applied to our dataset, to determine their potential for linking text to images in both functional and expressive ways, and imagining how these links could be implemented in an interactive installation.

The thesis presents the students with the opportunity to engage with their engineering practice from a creative perspective, and is well suited for those who are interested in the interface between art and engineering. As it surveys various NLP methods, this project is a great opportunity to learn NLP from the ground up, or push the boundaries of your NLP knowledge if you are experienced.

Main activities:

Survey various NLP algorithms applied to our dataset to see what possibilities they present for the creation of interesting interactive experiences in VR
Conceive of immersive visualization experiences on our stereoscopic virtual reality systems based on your exploration

Prerequisite:

Python Skills, basic machine learning knowledge, Interest in new media art, creative mindset.

Past Projects

Automatically Cropping out Photographs from Stereo Cards

Semester Project (DH/IC)

Student: Nil Yagmur Ilba

Supervisor: Dhruva Gowda Storz and Prof. Sarah Kenderdine

Project description:

Stereoscopic photographs were the first 3D immersive medium, and an ancestor to virtual reality systems today. These photographs were typically mounted in cardboard cards or glass plates (see image above), for viewing in a device called a stereoscope to impart the 3D relief effect.

At the laboratory for Experimental Museology, we are developing a pipeline for the automatic restoration and augmentation of stereo photographs to visualize them on virtual reality displays and bring this priceless medium back to the limelight. This semester project involves developing the first step of this pipeline: namely the cropping of two individual stereo images out of a single image of a scanned stereo card. Automatically cropping out frames from images has been done before, but not for stereo images. Your task in this project will be to investigate several methods for doing this, such as colour thresholding, edge detection, or more advanced methods if necessary, to find the most efficient and robust solution for this step of the pipeline. The main challenge is that these stereo cards come in many shapes and forms, so the solution we develop will need to work for all of them or as many of them as possible.

Main activities:

Research the different types of stereo cards available in archives around the world, and create a dataset of them for testing.
Explore various cropping strategies to find the most efficient and robust solution
Iterate on the best model to make it efficient and ensure it works with a wide variety of stereo pair types.

Prerequisite:

Python skills, Basic experience in Image Processing

Interactive interface to explore the knowledge of martial arts

Master projects (DH)

Student: Xinyi Ding

Supervisor: Yumeng Hou and Sarah Kenderdine

Project description:

The grand research scheme is positioned in the field of Computational Archival Science and inspects the integration of computational methods (semantic representation [1] and movement similarities [2]) with interface design to augment “knowledge access” for the Hong Kong Martial Arts Living Archive (HKMALA).

This master project will contribute to the final stage: an interactive knowledge interface. A chief focus is on User Experience assessment and improvement. While in practice, the goal is to craft an interactive interface for users to explore the (multimodal) knowledge elements in an HKMALA dataset.

Main activities:

The student will start with an initial web app framework and a list of major functionalities / computational modules to assemble.
The student is welcome to refine, or incredibly rebuild and redefine, the knowledge interface, for which basic web-app development skills (HTML/Python) are necessary.
The student is expected to conduct user studies (we will advise tools and framework) to evaluate the performance and usability of the interface.

Keywords: interactive interface, human-computer interaction, user experience, knowledge/recommendation system

The project involves interactions with martial artists and performing artists, and if interested, training on Chinese/European martial arts!

Leveraging Generative AI to Enhance Text-Video Dataset Quality

Master projects (DH/IC)

Student: Yixuan Duan

Supervisor: Yuchen Yang and Sarah Kenderdine

Project description:

Text-video representation, focusing on building a joint embedding for text descriptions and video content, has become increasingly important in machine learning and powers many tasks such as text-video retrieval, video captioning, and text-video generation. At the core of every machine learning breakthrough lies the dataset. The limitations of the current datasets (such as MSR-VTT) have been widely acknowledged. For example, the current dataset provides only plain visual descriptions of the videos, lacking more in-depth and complex captions (“two men sitting on a bench” vs “a father talking to his son about his mom who’s just passed away in great sadness, in the park nearby their house”). The need for a more comprehensive and diverse dataset is apparent, as it would support better text-video tasks. Although new tools, such as new video captioning models (using multimodality cues and alternative knowledge sources) and GPT-inspired ones, have been developed, their application in creating a better dataset has not been explored.

The core research question for this project is how to utilize generative AI to create more comprehensive and detailed descriptions of videos. The objective is to incorporate not only visual but also multimodal cues of the video, such as conversations, ambient sounds, on-screen text, and overall emotions, into the descriptions. The project will utilize state-of-the-art generative tools like MV-GPT and ChatGPT to generate detailed, factual, and holistic descriptions of the videos based on the existing MSR-VTT dataset. To ensure accuracy and reliability, The project also seeks to find methods and manual steps for illustrating, understanding, and controlling the hallucination problem that may arise from the use of generative models (i.e. using rule sets or equivalent to regulate the generated content to be factual).

The project provides a perfect venue for students eager to step into the area of generative models and the frontier of representation learning with hands-on tasks and real-world contributions.

Dynamic Audio Remixing for Montreux Jazz Festival Archive

Master projects (DH/IC)

Student: Toussain Cardot

Supervisor: Sarah Kenderdine, Kirell Benzi and Giacomo Alliata

Project description:

The Montreux Jazz Festival, a globally renowned jazz event held annually in Switzerland, boasts a treasure trove of music performances spanning several decades within its archive. This student project endeavours to craft an interactive installation, affording festival visitors the chance to become their own DJs, and remixing archival performances in a dynamic and engaging manner. The primary goal of this project is to design and implement a machine learning-driven system adept at segmenting, categorizing, and enabling users to interactively remix various audio samples extracted from the Montreux Jazz Festival archive.

The project will be executed in several stages, beginning with the acquisition and preprocessing of the festival’s archives. Subsequently, signal processing and machine learning techniques will be employed to extract and segment distinct audio samples from larger performances. These extracted samples will then be categorized based on various attributes such as instrument type, genre, rhythm, and mood. The heart of the project lies in the development of an intuitive, user-friendly interface that promotes accessibility for individuals of diverse musical and technological backgrounds. Furthermore, an AI component will be integrated to suggest harmonious combinations of samples, drawing on musical theory, popularity, and machine learning models.

The project timeline spans 18 weeks, starting with data acquisition and preprocessing, moving on to sample extraction and segmentation, categorization of samples, and interface design and development.

Upon the successful completion of this endeavour, an innovative audio remixing system will emerge, allowing users to create unique remixes from the extensive Montreux Jazz Festival archive. This system aims to foster public engagement with this invaluable musical resource in a fresh, interactive format.

Stereoscopic cartography of the 1867 Exposition Universelle of Paris

Master project

Supervisor: Sarah Kenderdine

In 1867, France hosted the seventh Exposition Universelle. Napoleon III had a grand plan: on the Champ-de-Mars, he had a vast elliptical palace built and a park laid out among a hundred or so ephemeral pavilions. To immortalise this event, which attracted 52,000 exhibitors and 11 million international visitors, the organisers called on official photographers who documented the exhibition from every angle. Several thousand shots were taken, including 2,000 stereoscopic views – pairs of flat images which, when viewed through stereoscopic glasses, form a 3D image.

Within the research collaboration between eM+ and LARCA (Paris), this master’s project aims to produce a prototype of a virtual visit to the Exposition Universelle in the form of a navigable map made from a series of digitised stereoscopic views situated on the exhibition plan with the correct orientation. The results will enable the visualisation of these stereoscopic images in a fully interactive, large-scale 3D virtual projection system built by eM+.

The visualisation will firstly initiate deeper research into these images and the history of the 1867 Exposition Universelle, especially to broaden the different cultural and postcolonial perspectives in the contemporary context. The prototype will also provide the groundwork for developing a final installation to be presented in more exhibitions in Europe and the United States.

Computational Augmentation of Dance Videos through Artistic Renderings

Master project

Supervisors: Sarah Kenderdine, Giacomo Alliata

To celebrate the 50th anniversary of the Prix de Lausanne, eM+ will develop an interactive installation to explore its archive of videos and photos. As the competition for young dancers taking place in Lausanne every year, its archive features a beautiful collection of audiovisual materials about the dancers’ performances, complemented by photos of the participants and backstage. The installation will allow users to explore the archive and discover the dancers’ performances in a new light, through computer-generated visuals driven by features in the videos themselves. To this end, we sought to augment the archive by creating artistic visualizations of the dancers’ movements, leveraging state-of-the-art algorithms and AI to extract various features such as the dancers’ poses.

The student is tasked with exploring various approaches to create artistic visualizations from the videos of the dance performances. The first step of this work requires extracting features from the videos, ranging from 1973 up to 2022, and thus showcasing differences in image quality, camera angles, and backgrounds. Surveying the various state-of-the-art approaches and testing them on the actual audiovisual materials of the Lausanne Prix will therefore be an important part of the work. Subsequently, these features will be used as input to drive computer-generated artistic renderings, emphasizing the dancers’ harmonious movements for instance.

Digital Twin for Immersive Human-Computer Interaction

Master project

Supervisors: Sarah Kenderdine, Loïc Serafin

Among the immersive systems hosted at eM+, the Panorama offers a unique 360° stereoscopic experience with an impressive 9m diameter screen featuring 20K graphic resolution, which serves the lab’s aims at developing the next generation of large scale immersive interactive experiences.

This project would like to build a digital twin of the interactive system. In doing so, the first task is to gather all real-time data from, e.g., RGB cameras, Kinect sensor, Optitrack and Vive Tracker, and synchronize them together to enable creating interactive datasets. The next task is to reconstruct the 3D positions of sensors and model the whole system at a real-life scale.

In this environment, we will place a modular realistic 3D human avatar with animations. The goal will be to simulate all sensors using virtual data and research on the quality of the reconstruction to recreate the real-life artefacts and deformation of the input sensors. Evaluation of the system will be performed for usability for deep learning and computer vision development task.

Technologies: Unreal Engine, Nvidia Omniverse, Deep learning, 3D avatar, animation, Depth sensor, computer vision, C++ / Python

An alternative interface for browsing large audiovisual archives

Master project – Fall 2022

Student: Linyida Zhang

Supervisor: Sarah Kenderdine, Yuchen Yang

Audiovisual (AV) archives are an invaluable resource for preserving the past in a holistic manner. Unlike other forms, AV archives, however, can be difficult to explore. This is not only because of its complex modality, and the sheer volume, but also the lack of appropriate interfaces beyond keyword search. The recent raise in text-to-video retrieval tasks in computer science opens the gate to accessing AV content in a more natural and semantic way, able to map natural language descriptive sentences to matching videos. However, applications of this model are rarely seen. The contribution of this work is threefold. First, working with RTS (Télévision Suisse Romande), this project was able to identify the key blockers in a real archive for implementing such models and build a functioning pipeline for encoding raw archive videos to the text-to-video feature vectors. Second, the project designed and verified a method to encode and retrieve videos using emotionally abundant descriptions which were not supported in the original model. Third, this project proposed an initial prototype for immersive and interactive exploration of AV archives in a latent space based on the previously mentioned encoding of videos.

Wall of Memories

Master project – Spring 2021

Student: Giacomo Alliata

Supervisor: Sarah Kenderdine

Wall of Memories is part of the Cosmos Archeology exhibition, a collaboration between eM+ and LASTRO. The exhibition features various installations bridging the gap between art and science and will be held at EPFL Pavilions in the spring of 2022.

In this landscape of interactive experiences, Wall of Memories aims at exploring the Claude Nicollier Video Archives, a collection of audiovisual contents about the famous Swiss astronaut Claude Nicollier, digitized and curated by the Cultural Heritage and Innovation Center at EPFL.

A virtual wall of the videos is built in Unity, with a space-like atmosphere. The Linear Navigator, a touch screen mounted on a 12m rail fixed to the wall, is used to navigate this timeline of 23 years of videos, with the screen mimicking a window for the visitors to explore.

The contemplative nature of the installation invites the visitors to experience this collection of videos either by random selections and following the Linear Navigator to the corresponding year, or by clicking on a thumbnail that catches the visitor’s eye. Rather than looking for a specific video or theme, visitors are encouraged to appreciate the whole collection and discover through their own interests.

Interactive Knowledge Browser for Chinese Martial Arts

Semester project – Fall 2021

Student: Lin Yuan

Supervisor: Yumeng Hou and Sarah Kenderdine

Traditional martial arts are renowned as a critical yet endangered form of Intangible cultural heritage (ICH). For digitally preserving cultural assets, ontology and knowledge graphs (KG) can help organize the ICH entities and their interrelations in a graph structure. This work is motivated to design a KG for preserving cultural knowledge and reawakening the public interest in traditional martial arts.

In terms of knowledge acquisition, we will refine the ontologies proposed by the CROSSINGS and integrate the entities extracted from different sources with a central focus on the contents in the Hong Kong Martial Arts Living Archive (HKMALA).

As for user interaction, we will inspect visualization and interactive interface design to develop a KG-based application. Serious experiments will be carried out to understand in addition to further improving the user experience achieved.

Modelling Martial Arts Movements through Deep Learning

Semester project – Spring 2021

Student: Fadel Mamar Seydou

Supervisor: Yumeng Hou and Sarah Kenderdine

As part of an ongoing exploration, this project will further examine the scalability of ‘motion-as-a-tool’ to “encode” and “decode” the embodied knowledge from multi-modal digital archives. Specifically, this work will As part of an ongoing exploration, this project will further examine the scalability of ‘motion-as-a-tool’ to “encode” and “decode” the embodied knowledge from multi-modal digital archives. Specifically, this work will investigate a motion-based search framework, which should be effective, potentially scalable and interoperable for 2D (video) to 3D (MoCap) data alike. The existing 2D motion search modules, implemented in Python, can be used as a starting point and extended to a sophisticated package. We expect the student to explore the optimal usage of existing methodologies and develop a computational approach to enhancing knowledge representation. Ultimately, we will apply the outcome of this project to a use case for the Hong Kong Martial Arts Living Archive (HKMALA).

Recognizing Motions in Eastern Rituals

Semester project – Fall 2020

Students: F. M. Seydou, D. Cian, et al.

Supervisor: Yumeng Hou

This project aims to explore the potential of deep learning in automatic knowledge representation from multimodal digital archives. We will examine the potential of ‘motion-as-a-tool’ for knowledge representation from media archives. You will touch on a video dataset produced from the Re-enactment of Confucius Rites and the reconstruction of embodied knowledge from historical and archaeological sources, which contains hours of media archives so far and ever-increasing as more reenactment rituals are captured. We will start with a workable portion that documents the acts and movements in themed ritual ceremonies including the capping ceremony and archery ceremony. The filming process involves elite movie practitioners and advanced camera techniques including digital video and motion-capture technologies alongside green screen and virtual production techniques. Thus far, this material has been on exhibition in London, Chicago, Ljubljana, Beijing, and Hong Kong.

Soundpainting language recognition

Master project – Spring 2020

Soundpainting is the universal multidisciplinary live composing sign language for musicians, actors, dancers, and visual Artists.
The language comprises more than 1500 gestures that are signed by the Soundpainter (composer) to indicate the type of material desired of the performers. The creation of the composition is realized, by the Soundpainter, through the parameters of each set of signed gestures.

The goal of this project is to build a performance and pedagogical tool based on the soundpainting language. Using a machine-learning algorithm, the software will be able to recognize soundpainting signs
from motion-tracking equipment and let the user control it’s virtual orchestra and discover the soundpainting language.

Pindex Tours

Master project – Spring 2020

Cities, parks, zoos, or even art festivals. All of those seem pretty different, but they all have one thing in common, visitors. How can each one of those entities keep their guests entertained and interested while properly transmit knowledge? Incredible solutions exist, and experts from around the world have built fantastic custom made solutions to solve this problem; however, it often requires a lot of resources, and it is not within everyone’s reach. Believe it or not, but a considerable amount of smaller entities still relies on paper maps. Technology can undoubtedly do better.

This project, Pindex Tours, is about finding an approach to tackle this problem. Using beacons accross the places for accurate geolocation, interactive maps and content, the final service aims to be scalable, affordable, easy to use, but most importantly, with quality content. From business research to a real-life demo, this multidisciplinary project will result in a service that will be further developed and used by Pindex.

Neuronal Ancestral Sculptures Series

How does future heritage look like? What role could AI and deep learning play? How can we visually reassemble the patterns of specific eras or locations in heritage digitisation such as the richness of forms from Mesopotamia or todays’ Iraq? How can we built new decolonial databases and thereby add to the visual contemporary imaginary?

The artistic research project from Nora Al-Badri, Neuronal Ancestral Sculptures Series“ will specifically examine the potential of GANs in this context as a new artistic tool and use heritage data which is data based on ancestral knowledge (forms, patterns, artefacts) to create a form of generative aesthetics that go going beyond representation and mimesis. Thus we are looking for a partner who already has experience in applying GAN. The designated GAN framework has yet to be identified, since this is rather not a style transfer or cycle GAN task. Potentially it could be a GAN using spectral normalization (https://arxiv.org/pdf/1802.05957.pdf). Also trying to work with conditional GANs might be of interest since that also integrates the human intervention in the process and that classes can be defined more clearly.

In this project we will start by using the MET API dataset of images of different categories of artefacts such as sculpture, seals, bowls, cuneiforms etc… from 400-2500 images, which should be sufficient for a test version. Here is the METs interface to discover their collection and the API. One challenge will be the heterogeneity of objects and their images. In the course of this project there might other datasets be added such as from the British Museum or the Baghdad Museum, which are not accessible online, but are digitised.

Sonifying the Atlas of Maritime Buddhism

The Atlas of Maritime Buddhism is a research and exhibition project led by Professor Kenderdine. It has a number of major academic and museum partners in Switzerland, USA, Australia, China and Hong Kong. The Atlas encompasses the spread of Buddhism across 12 countries from India to China, Korea and Japan through the seaports of SE Asia. Is of great academic importance as it contains evidence that counterbalances prevailing narratives which foreground the overland Silk Road, and neglect the importance of pan-Asian maritime countries and Buddhism entrepreneurship in the expansion of trade from 2nd century BC-12th century AD. The exhibition includes a large scale 360-3D narrative-driven deep mapping schema as an information visualization framework for interactively exploring the narrative patterns, processes and phenomena in the Atlas. Fieldwork data includes stereographic panoramas, ambisonics, 360 video, gigapixels and photogrammetric models of the world’s most important Buddhism sculptures found throughout the region. It is politically relevant as it reflects the sharp increase in interest in this topic related to One Belt One Road initiatives led by China today. This project was recently presented at the World Economic Forum recently in Tianjin. The exhibition is destined to be installed in major museums worldwide and there will be a permanent installation in Taiwan.

The project for internship offered is to sonify up to 60 Buddhist sculptures (3D photogrammetric objects of very high resolution), highly valuable and many national treasures captured in museums spread across the world. Iconographic transmission is accompanied by distinct sonic differences through the region and the interactive browser of Buddhist statues for this major project will require both archival research, ethnographic investigation and digital file manipulation and real time interactive programming with conversion to ambisonic format where possible from original sources.

A new way to experience traditional music of Afghanistan

Mathieu Clavel, a master’s student in the College of Humanities (CDH) Digital Humanities Institute, is bringing musical heritage from Afghanistan to life using tools from data science and virtual reality.