Available Projects Spring 2025

If you are interested in doing a research project (“semester project”) or a master’s project at IVRL, you can do this through the Master Programs in Communication Systems or in Computer ScienceNote you must be accredited to EPFL. This page lists available semester/master’s projects for the Spring 2025 semester.

For any other type of applications (research assistantship, internship, etc), please check this page.

Description:

Recent advances in neural rendering-based 3D scene reconstruction have demonstrated strong capacity in representing visually plausible real-world scenes. However, most of them rely heavily on dense multi-view captures and therefore restricted from broader applicability.

In this project, we aim to exploit the strong priors of latent-based video diffusion model for synthesizing high-fidelity novel views of generic scenes from single or sparse input captures. We adopt radiance field as the scene representation and explore the implicit 3D understanding and intra-frame attention-correlation exhibited in video diffusion models in place for the multi-view capture input in prior works.

Type of work:

  • MS Level: semester project/master project
  • 65% research, 35% development

Prerequisite:

  • Knowledge in deep learning frameworks (e.g., PyTorch or Tensorflow), image processing and Computer Vision.
  • Experience with 3D vision is required (e.g. course taken, independent projects, etc. )

Supervisor:

Reference Literature:

  • High-resolution image synthesis with latent diffusion models
  • ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis
  • FSGS: Real-time few-shot view synthesis using gaussian splatting

Startup company Innoview has developed a software framework to create hidden watermarks printed on paper and to acquire and decode them by a smartphone. The acquisition by smartphone comprises many separate parametrizable parts. The project consists in improving some of the parts of the acquisition pipeline in order to optimize the recognition rate of the hidden watermarks (under Android).

Deliverables:

  • Report and running prototype.

Prerequisites:

  • basic knowledge of image processing and computer vision,
  • Coding skills in Java Android, C#, and/or Matlab

Level: BS or MS semester project

Supervisors:

Dr Romain Rossier, Innoview Sàrl, [email protected], tel 078 664 36 44

Prof. Roger D. Hersch, BC110, [email protected], cell: 077 406 27

Startup company Innoview has developed arrangements of lenslets that can be used to create document security features. The goal is to improve these security features and to optimize them by simulating the interaction of light with these 3D lenslet structures, using the Blender software.

Deliverables:

  • Report and running prototype (Matlab). Blender lenslet simulations.

Prerequisites:

  • knowledge of computer graphics, interaction of light with 3D mesh objects,
  • basic knowledge of Blender,
  • Coding skills in Matlab

Level: BS or MS semester project

Supervisors:

Prof. Roger D. Hersch, BC110, [email protected], cell: 077 406 27

Dr Romain Rossier, Innoview Sàrl, [email protected], tel

078 664 36 44

Startup company Innoview has developed new moiré features that can prevent counterfeits. Some types of moiré features rely on grayscale images. The present project aims at creating a grayscale image editor. Designers should be able to shape their grayscale image by various means (interpolation between spatially defined grayscale values, geometric transformations, image warping, etc…).
 
Deliverables: Report and running prototype (Matlab). Blender lenslet simulations.
 
Prerequisites:
– knowledge of image processing / computer vision
– coding skills in Matlab
 
Level: BS or MS semester project
 
Supervisors:
Prof. Roger D. Hersch, BC110, [email protected], cell: 077 406 27
Dr Romain Rossier, Innoview Sàrl, [email protected], , tel
078 664 36 44

This project aims to explore whether there is any semantic information encoded by off-the-shelf diffusion model that helps us and other deep learning models understand what is the content of an image or the relationship between images.

Diffusion models [1] have been the new paradigm for generative modeling in computer vision. Despite its success, it remains to be a black box during generation. At each step, it provides a direction, namely the score, towards the data distribution. As shown in recent work [2], the score can be decomposed into different meaningful components. The first research question is: does the score encode any semantic information of the generated image?

Moreover, there is evidence that the representation learned by diffusion models is helpful to discriminative models. For example, it can boost the classification performance by knowledge distillation [3]. Furthermore, diffusion model itself can be used as a robust classifier [4]. It can be seen that discriminative information can be extracted from the diffusion model. Then the second question is: What is the information about? Is it about the object shape? Location? Texture? Or other kinds of information.

This is an exploratory project. We will try to interpret the black box in diffusion model and dig semantic information that it encodes. Together, we will also brainstorm the application of diffusion model other than image generation. This project can be a good chance for you to develop interest and skills in scientific research.

References:

[1] Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models[J]. Advances in neural information processing systems, 2020, 33: 6840-6851.

[2] Alldieck T, Kolotouros N, Sminchisescu C. Score Distillation Sampling with Learned Manifold Corrective[J]. arXiv preprint arXiv:2401.05293, 2024.

[3] Yang X, Wang X. Diffusion model as representation learner[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023: 18938-18949.

[4] Chen H, Dong Y, Shao S, et al. Your diffusion model is secretly a certifiably robust classifier[J]. arXiv preprint arXiv:2402.02316, 2024.

Deliverables: Deliverables should include code, well cleaned up and easily reproducible, as well as a written report, explaining the models, the steps taken for the project and the results.

Prerequisites: Python and PyTorch. Basic understanding of diffusion models.

Level: MS research project

Number of students: 1

Contact: Yitao Xu, [email protected]

Introduction
3D mesh generation plays a pivotal role in virtual reality, gaming, and digital content creation, but generating high-quality, detailed meshes remains a challenging task. Traditional methods often fail to capture fine-grained details or optimize computational efficiency, especially for complex, textured surfaces. This proposal seeks to enhance 3D mesh generation by incorporating frequency decomposition models, leveraging multi-resolution analysis to capture both broad structural features and intricate details.

Objective
The primary goal of this research is to develop a frequency-based decomposition model for 3D mesh generation, enabling precise control over the detail level of generated meshes. By decomposing spatial and frequency components, we aim to improve mesh quality, reduce processing times, and enhance texture and surface detail.

Methodology

  1. Frequency Decomposition: Apply discrete wavelet transforms (DWT) on the spatial and normal maps of 3D meshes, separating high-frequency components (surface details) from low-frequency components (broad structural shapes).

  2. Component-specific Optimization: Tailor the mesh generation model to optimize specific frequency components. For example, low-frequency structures can be prioritized for smooth topology, while high-frequency details can be preserved in texture-rich areas.

  3. Multi-level Reconstruction: Iteratively reconstruct the mesh from frequency components using an inverse wavelet transform (IDWT), allowing for customizable levels of detail depending on the desired quality.

  4. Evaluation and Benchmarking: Compare the proposed approach against existing methods on benchmarks, measuring structural consistency, texture fidelity, and computational efficiency.

Expected Contributions

  1. A novel, frequency-based approach for enhancing 3D mesh quality.
  2. A multi-level decomposition and reconstruction framework that allows selective detail optimization.

An efficient algorithm capable of handling complex surfaces without compromising mesh detail.

Prerequisites: Python and PyTorch. Basic understanding of diffusion models.

Level: MS research project

Number of students: 1

Contact: Yufan Ren, [email protected]

Introduction
With the advancement of Vision-Language Models (VLMs), generating coherent, contextually relevant narratives from images has become an exciting yet challenging frontier. Current models struggle to maintain narrative consistency, often introducing contradictory details or missing contextually vital elements when interpreting a sequence of images. This proposal seeks to enhance the storytelling capabilities of VLMs by introducing a self-consistency mechanism, aimed at reinforcing coherence, maintaining character continuity, and upholding narrative flow across multiple image inputs.

Objective
The main objective of this research is to develop a self-consistency framework for VLMs that enables improved narrative coherence in visual storytelling tasks. This mechanism will monitor and enforce consistency in story elements such as character traits, setting, actions, and progression, producing narratives that align closely with human expectations for logical and consistent storytelling.

Methodology

  1. Self-Consistency Module: Integrate a self-consistency module within the VLM architecture, which will cross-reference details across sequential images, ensuring that entities, actions, and story elements remain logically consistent. This module will evaluate consistency by tracking character attributes, scene elements, and temporal relationships, adjusting model outputs to rectify inconsistencies.

  2. Memory and Reference Mechanisms: Implement a memory-based mechanism to store narrative elements identified in each image, maintaining a “story memory” that captures the main characters, locations, and story arcs. This will allow the VLM to reference earlier parts of the story and avoid contradictions or omissions as it progresses.

  3. Training with Self-Supervision: Use self-supervised learning to fine-tune the model on datasets where story coherence is crucial. During training, the model will be penalized for introducing inconsistencies in narrative elements or disrupting logical story progression.

  4. Evaluation and Benchmarking: Develop a new visual storytelling benchmark focused on self-consistency, assessing narrative coherence, character consistency, and story progression. The model will be evaluated on metrics such as narrative accuracy, coherence, and alignment with human story interpretation.

Expected Contributions

  1. A novel self-consistency mechanism for VLMs to enhance coherence in multi-image storytelling tasks.
  2. A memory-based reference model that maintains continuity across scenes, characters, and settings.
  3. A new benchmark and evaluation framework for testing and measuring consistency in visual storytelling.

Prerequisites: Python and PyTorch. Basic understanding of diffusion models.

Level: MS research project

Number of students: 1

Contact: Yufan Ren, [email protected]

Description

Diffusion models have advanced the field of image generation, enabling one to create realistic and detailed images of almost any scene. However, these models depend more on rote learning of example scenes rather than a true understanding of a scene and its geometry. As a consequence, generated images can feature incorrect perspectives and geometric features. On the contrary, natural photographs feature specific geometric features. In particular, lines that are parallel in a scene converge on the photograph to a vanishing point, and all vanishing points derived from lines on parallel planes lie on the same vanishing line. When these principles are broken on generated images, the images can lack realism. In augmented or virtual reality systems, breaking these issues can lead to a disrupted viewer immersion. Geometric accuracy is furthermore crucial for applications such as architectural visualization. Thus, improving perspective in generated images would not only enhance the aesthetic quality of images, but also expand the utility of generative models in professional domains. Addressing this challenge could push the boundaries of what generative models can achieve. On the other hand, current geometric artefacts could be analysed to distinguish generated images from real ones and detect deepfakes.

In this project, we aim to investigate the geometry of images, both real and generated. We will review geometry analysis methods, most notably vanishing points detection. This problem has been studied in the literature for a long time, both with geometric and algorithmic methods and with more recent learning-based tools. However, it remains to be seen which of these methods still apply when geometrical correctness cannot be assumed in the first place. Furthermore, many generative models focus on generating faces, which are more difficult to analyse due to the absence of straight lines.

The developed tools will be used to assess and quantify the geometry inaccuracies obtained with various diffusion models. Then, depending on the interests and early results, this project could fork into two possible topics. A first application would be to develop a deepfake detection tool based on geometry analysis. Beyond deepfake detection, one could also seek to improve diffusion model generation to ensure geometric correctness of the generated images.

Deliverables

The final report should contain a review, both experimental and theoretical, of existing vanishing point detection methods. It might be relevant to implement one or several of the older methods, for which code is not always available. The review should focus as well on the specificities of generated image analysis detailed above. It should contain experiments assessing to which extent different diffusion models create geometric artefacts.

The report should also detail the proposed innovations on at least one of the following topics:

  • Improvements made to vanishing points detection
  • Deepfake detection using geometric analysis
  • Improving the perspective of generated images.

Overall, the report should be structured, and present experiments done during the project and conclusions that can be drawn from them. New proposed methods, as well as reimplemented ones, should be explained in a reproducible manner, for example with pseudo-code. If any training is involved, training details should be comprehensively explained.

In addition to the report, a clean, well-documented code enabling the reproduction of experiments will be expected.

Prerequisites

  • Strong skills in geometry
  • Proficiency in writing clean code, ideally with Python and Pytorch
  • Depending on the directions of this project, statistics and probability and/or a basic understanding of diffusion models (ideally both)

Type of work and number of students

Either one Master’s thesis student, or one or two (ideally two) MS research project students (semester projects)

Supervision

Quentin Bammey, [email protected]

Main references

As the first reference contains important details pertaining to this project, please read it before applying.

  • Farid, Hany. “Perspective (in) consistency of paint by text.” arXiv preprint arXiv:2206.14617 (2022). https://arxiv.org/abs/2206.14617
  • Desolneux, Agnes, Lionel Moisan, and Jean-Michel Morel. From gestalt theory to image analysis: a probabilistic approach. Vol. 34. Springer Science & Business Media, 2007. (Chapter 8 is of particular interest to this project, but the whole book will be relevant if focusing on deepfake detection)
  • Almansa, Andrés, Agnes Desolneux, and Sébastien Vamech. “Vanishing point detection without any a priori information.” IEEE Transactions on Pattern Analysis and Machine Intelligence 25.4 (2003): 502-507.
  • Upadhyay, Rishi, et al. “Enhancing diffusion models with 3d perspective geometry constraints.” ACM Transactions on Graphics (TOG) 42.6 (2023): 1-15.
  • Santana-Cedrés, Daniel, et al. “Automatic correction of perspective and optical distortions.” Computer Vision and Image Understanding 161 (2017): 1-10.
  • Tehrani, Mahdi Abbaspour, Aditi Majumder, and M. Gopi. “Correcting perceived perspective distortions using object specific planar transformations.” 2016 IEEE International Conference on Computational Photography (ICCP). IEEE, 2016.
  • Zhang, Lvmin, Anyi Rao, and Maneesh Agrawala. “Adding conditional control to text-to-image diffusion models.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023.
  • Tutorial on diffusion models: https://cvpr2022-tutorial-diffusion-models.github.io/

Introduction:

Existing image generation methods [1,2,3] typically rely on large datasets of clean images to learn data distributions. However, a large corpus of clean data is not always available. This project seeks to explore whether noisy data alone can be used for image generation.

Several works have shown that clean images can be restored using only noisy data, such as Noise2Noise [4], Noise2Void [5] and Noiser2Noise [6]. They typically train their denoisers using paired noisy data or even just a single noisy realization of each training image.

Meanwhile, recent findings have established relationships between Score Matching density estimates [7] and denoising, showing that the score can be estimated using a Gaussian denoiser. Consequently, Gaussian denoisers can be employed to draw high-probability samples from the implicit prior embedded within them [8].

By combining these two techniques, a natural question arises: can we use only noisy data for realistic clean image generation?

 

Objective:

The primary objective of this project is to develop a framework for realistic image generation using only noisy data. Specifically, the model will be trained exclusively on noisy data but should generate clean images during inference. This approach is especially valuable in scenarios where clean datasets are scarce or unavailable.

 

Type of work:

master semester project

65% research, 35% development

 

Prerequisite:

Proficiency in deep learning frameworks (e.g., PyTorch)

Familiarity with image processing and computer vision

(Optional) Prior knowledge of diffusion models is advantageous

 

Supervisor:

Liying Lu ([email protected])

 

Reference:

[1]. Generative Adversarial Networks

[2]. Denoising Diffusion Probabilistic Models

[3]. High-Resolution Image Synthesis with Latent Diffusion Models

[4]. Noise2Noise: Learning Image Restoration without Clean Data

[5]. Noise2Void – Learning Denoising from Single Noisy Images

[6]. Noisier2Noise: Learning to Denoise from Unpaired Noisy Data

[7]. Generative modeling by estimating gradients of the data distribution

[8]. Stochastic Solutions for Linear Inverse Problems using the Prior Implicit in a Denoiser