Available Projects – Fall 2024 ‒ IVRL ‐ EPFL

If you are interested in doing a research project (“semester project”) or a master’s project at IVRL, you can do this through the Master Programs in Communication Systems or in Computer Science. Note you must be accredited to EPFL. This page lists available semester/master’s projects for Fall 2024 semester.

For any other type of applications (research assistantship, internship, etc), please check this page.

Description: In this research project, we would like to explore diffusion models to generate 3D building blocks designs/models (for instance, a set of LEGO®️ blocks with assembly instruction), ideally using a photo and/or a textual description as a conditioning prompt.

Several software (such as LDCad or BrickLink Studio) allow users to create new 3D brick designs/models, but this process is manual and time-consuming. Automatic tools are, to our knowledge, limited to either 2D bricks mosaics, or to the need of existing 3D files. We would like to explore the possibility of generating 3D designs/models by conditioning on a textual description and/or a photo, using deep learning generative models (diffusion models).

Some possible references (more precise directions/experiments for the project should be discussed with the supervisors):

Tutorial on diffusion models: https://cvpr2022-tutorial-diffusion-models.github.io/
Hugging Face Diffusion Models Course: https://huggingface.co/learn/diffusion-course/unit0/1
A dataset of 3D objects (not 3D bricks): https://objaverse.allenai.org/
A database of 3D bricks designs/models: https://www.ldraw.org/article/593.html
Dreamfusion paper, presenting a method (SDS loss) to leverage existing 2D diffusion models for 3D models generation: https://dreamfusion3d.github.io/
ControlNet paper, presenting a method to add new conditioning signals (eg, sketch, depth) to an existing diffusion model: https://arxiv.org/abs/2302.05543
Some existing 3D bricks design software: https://www.melkert.net/LDCad, https://www.bricklink.com/v3/studio/main.page
- How 2D reference image is currently used in tools: https://studiohelp.bricklink.com/hc/en-us/articles/15341721435799-Reference-Image [note it is just a guiding help, it is not automatically generating a 3D brick design]
Automatic mosaic and sculpture tools: https://studiohelp.bricklink.com/hc/en-us/articles/6508264220183-Sculpture, https://studiohelp.bricklink.com/hc/en-us/articles/5625025298327-Mosaic
An example of file format to describe 3D bricks models/designs: https://en.wikipedia.org/wiki/LDraw#Example_File:_pyramid.ldr,_a_Lego_Model_of_a_Pyramid

Deliverables: Should include code, well cleaned up and easily reproducible, as well as a written report, explaining the models, the steps taken for the project and the performance of different approaches.

Prerequisites: Python and PyTorch. (Optional) familiarity with 3D bricks design software.

Level: Ideally MS research project (semester project) or Master project (master thesis), potentially BS research project (semester project)

Number of students: 1 or 2

Supervisors: Martin Nicolas Everaert (martin.everaert[at]epfl.ch), Eric Bezzam (eric.bezzam[at]epfl.ch)

Description:

The goal of this research project is to explore the manifold structure underlying natural images using diffusion/denoising models. Natural images, such as photographs or artworks, exhibit a complex distribution that is hypothesised to lie on a low-dimensional manifold (set of all realistic images) within a high-dimensional space (pixel space, space of all possible images, including noisy images).

Especially, in the neighborhood of a real image, there are a few dimensions in which we can move (e.g., make the image slighly brighter, slightly move the camera, change the color of an object in the image, etc) which maintain the realism of the image. However, most directions in the pixel space would simply look like noising/corrupting the image.

Diffusion models can be useful here, as they essentially learn to denoise/uncorrupt images. Finding the directions which keep the images realistic is useful, for instance for image editing tasks [ImageEditing, DragGAN].

Can we use denoising/diffusion to answer some of the following questions:

– Given two images, what are the different paths that can be used to interpolate from one image to the other? If we use interpolation using a diffusion model [SDInterpolation], does it find the shortest path on the manifold between the two images? Are there other possible paths that look more natural?

– Given an image, what is the intrinsic dimension of the image manifold around this image? In how many independent dimensions can we move while keeping the image realistic? Is this dimension the same in all images [ManifoldHypothesis]? What happens at the transistion when this dimension changes?

– How can we find the possible editing directions around an image using denoising/diffusion? Can we get similar results as GANs [ImageEditing, DragGAN]?

References:

[SDInterpolation] Rustam Akimov. Images Interpolation with Stable Diffusion. https://huggingface.co/learn/cookbook/en/stable_diffusion_interpolation

[ImageEditing] Pajouheshgar, Ehsan, Tong Zhang, and Sabine Süsstrunk. “Optimizing latent space directions for gan-based local image editing.” ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022.

[ManifoldHypothesis] Brown, Bradley CA, et al. “Verifying the union of manifolds hypothesis for image data.” arXiv preprint arXiv:2207.02862 (2022).

[DragGAN] Pan, Xingang, et al. “Drag your gan: Interactive point-based manipulation on the generative image manifold.” ACM SIGGRAPH 2023 Conference Proceedings. 2023.

Deliverables: Deliverables should include code, well cleaned up and easily reproducible, as well as a written report, explaining the models, the steps taken for the project and the performances of the models.

Prerequisites: Python and PyTorch.

Level: Ideally MS research project (semester project), potentially BS research project (semester project)

Number of students: 1

Supervisor: Martin Nicolas Everaert (martin.everaert [at] epfl.ch)

Description:

The goal of this research project is to explore Gaussianization techniques for high-dimensional data distributions (e.g. distribution of natural images). Gaussianization aims transforming non-Gaussian data distribution (usually high-dimensional) into a standard Gaussian distribution.

Gaussianization can be beneficial for various statistical and machine learning tasks (e.g., generation of images), because it makes each component of the data distribution statistically independent, reducing the curse of dimensionnality and allowing to process (e.g., estimate the density or learn how to generate) each dimension independantly.

Different techniques can be used/combined to transform a distribution towards a Gaussian distribution:

– Classical whitening methods (PCA, ICA, ZCA)

– Iterative methods (eg, normalize each dimension, apply a random rotation, and repeat [RBIG])

– Non-iterative methods [RG, NIG] (e.g., assume the data is symetrically distributed and Gaussianize the distribution of the norm of the data [RG])

– Using Neural Networks (e.g., VAEs [VAE], Diffusion models [Diffusion])

– Connections between Gaussanization and BatchNormalization [BM] / BatchWhitening [BW] in neural networks could also be made

In this project, you will explore such techniques and apply them to toy examples and real datasets of images, and propose new improvements of these methods.

References:

[RBIG] Laparra, Valero, Gustavo Camps-Valls, and Jesús Malo. “Iterative gaussianization: from ICA to random rotations.” IEEE transactions on neural networks 22.4 (2011): 537-549.

[RG] Lyu, Siwei, and Eero Simoncelli. “Reducing statistical dependencies in natural signals using radial Gaussianization.” Advances in neural information processing systems 21 (2008).

[NIG] Rui, Rongxiang, and Maozai Tian. “Non-iterative Gaussianization.” arXiv preprint arXiv:2203.14526 (2022).

[VAE] Kingma, Diederik P., and Max Welling. “Auto-encoding variational bayes.” arXiv preprint arXiv:1312.6114 (2013).

[Diffusion] Song, Yang, et al. “Score-Based Generative Modeling through Stochastic Differential Equations.” International Conference on Learning Representations. 2020.

[BN] Ioffe, Sergey, and Christian Szegedy. “Batch normalization: Accelerating deep network training by reducing internal covariate shift.” International conference on machine learning. pmlr, 2015.

[BW] Cho, Yooshin, et al. “Improving generalization of batch whitening by convolutional unit optimization.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.

Laparra, Valero, Gustavo Camps-Valls, and Jesús Malo. “PCA gaussianization for image processing.” 2009 16th IEEE International Conference on Image Processing (ICIP). IEEE, 2009.

Prerequisites: Python and PyTorch.

Level: Ideally MS research project (semester project), potentially BS research project (semester project)

Number of students: 1

Supervisor: Martin Nicolas Everaert (martin.everaert [at] epfl.ch)

Description:

While film photography was almost completely replaced by digital photography in the beginning of the century, it is nonetheless slowly growing in popularity, because of its distinctive “look”. However, most users are not familiar with how a film camera works, and film stock prices are skyrocketing. For this reason, film emulators are very popular, since they can take as input a digital image and produce an approximate simulation of what a film photograph of the same scene would look like. However, different film stocks have different physical (or chemical) responses to light, because of the difference in sensitivity(ISO), the distribution of the silver halide grains, the color filters used to produce color images, the presence or not of halation filters etc… In order to mimic these properties, simulators use generic sliders which can be tuned to approach a plausible look. However these preset profiles are not based on any physical properties of the film stock itself.

In this project, our goal is to create a physically-based simulator, based on experimental measurements, for one or more film stocks. The project will therefore imply data acquisition and analysis, with film cameras and different film stocks, as well as a precise modeling of the film response.

Previsional project steps:

1- Data acquisition: The project’s first step will be to acquire film + digital images for well set scenarii. We will use the IVRL lab which offers many possibilities to acquire scenes with different lighting conditions. The acquisition of the images will depend on the analysis. It will also include developing and scanning the films appropriately. We will also have to establish a protocol to correctly acquire the images.

2 – Data Analysis: After having acquired the data, we will have to analyze the results for various properties:

Grain: grain is one of the most important visual aspects of film photography. the goal will be to analyze its distribution focusing on multiple aspects
the shape of the grains,
the mean and std of the size
the density of grain
the correlation between grain properties and signal intensity

For some of these, we will require the use of a microscope to correctly identify the grains properties. We will first do this study on gray level film, which will be simpler than color film which is a superimposition of 3 photosensitive layers covered by color filters.

Tone mapping: tone mapping is a classic step of the image signal processing pipeline, which can be changed by the users. However, for film photography the contrast response function is directly linked with the physical properties of the film stock itself. We can correctly analyze the tone mapping by using standard image targets.

Color profile: Similar to tone mapping, color profile is also dependent on the film stock. A variety of methods already exist to transfer color responses from one sensor to the other, with polynomial models fitted with least squares on RAW pairs. More modern approaches involve deep learning. Since our goal is to first tackle black and white film this is a side lead for this project. [5,6]

3 – Modeling:

For grain, different models exist which are more or less physically based. Our goal will be to include the statistical and morphological analysis results in an already existing model to better mimic grain generation. [1] [2] Proposes to model the grain rendering using the boolean model while [3] approximates it using additive white noise. We could also explore learning based approaches such as [4]

Supervision:

The supervision of this project will be done by Raphael Achddou and Gwilherm Lesné from Telecom Paris, for his expertise on grain simulation.

Prerequisites:

Python and PyTorch, image and signal processing basis.

Type of Work:

MS semester project.

80% research, 20% development

Contact:

raphael.achddou@epfl.ch gwilherm.lesne@telecom-paris.fr

References:

[1] Newson, Alasdair et al. “A Stochastic Film Grain Model for Resolution‐Independent Rendering.” Computer Graphics Forum 36 (2017): n. pag.

[2] B. E. Bayer, “Relation Between Granularity and Density for a Random-Dot Model,” J. Opt. Soc. Am. 54, 1485-1490 (1964)

[3] Zhang, Kaixuan et al. “Film Grain Rendering and Parameter Estimation.” ACM Transactions on Graphics (TOG) 42 (2023): 1 – 14.

[4] Ameur, Zoubida et al. “Deep-Based Film Grain Removal and Synthesis.” IEEE Transactions on Image Processing 32 (2022): 5046-5059.

[5] Afifi, M., Abuolaim, A.: Semi-supervised raw-to-raw mapping. CoRR abs/2106.13883 (2021), https://arxiv.org/abs/2106.13883

[6] Rang, N.H.M., Prasad, D.K., Brown, M.S.: Raw-to-raw: Mapping between image sensor color responses. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition. pp. 3398–3405 (2014). https://doi.org/10.1109/CVPR.2014.434 3,

Description:

Blind face restoration endeavors to recover high-quality facial images from low-quality counterparts with various unknown degradations, including noise, compression, and blur. Recent advancements, employing deep convolutional networks [1,2,3,4], have demonstrated remarkable progress. Nevertheless, these methods struggle when faced with extreme degradation scenarios, often arising from severe levels of distortion or large facial poses, posing a persistent challenge. Although recent approaches propose leveraging different priors, for example, 3D priors [5], geometric priors [6] or generative priors [7] to enhance restoration quality, they still exhibit artifacts in extreme cases.

Recently, diffusion models have exhibited robust capabilities in generating realistic images, and they have been used in restoration tasks such as image super resolution and image deblurring [8,9]. In this project, we aim to explore the potential of diffusion models in addressing the demanding task of extreme blind face restoration. We will try to harness the extensive prior knowledge encoded in existing pre-trained diffusion models, seeing if we can extract textural or structural information of natural facial images that might encoded within these models, and use this information to aid our facial image restoration task.

Key Questions:

Instead of training a diffusion model from scratch with a limited number of images available in benchmarks, is it possible for us to utilize the prior information encoded in diffusion models that were pre-trained on a large amount of data to aid the restoration task? How can we extract relevant information?

After extracting the relevant information, what is the best way to fuse it with the low-quality images to obtain good results?

The pre-trained diffusion models might be biased towards facial images with normal poses. How can we deal with this?

References:

[1]. Wang, Xintao, et al. “Towards real-world blind face restoration with generative facial prior.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021.

[2]. Wang, Zhouxia, et al. “Restoreformer: High-quality blind face restoration from undegraded key-value pairs.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022.

[3]. Zhou, Shangchen, et al. “Towards robust blind face restoration with codebook lookup transformer.” Advances in Neural Information Processing Systems 35 (2022): 30599-30611.

[4]. Gu, Yuchao, et al. “Vqfr: Blind face restoration with vector-quantized dictionary and parallel decoder.” European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022.

[5]. Chen, Zhengrui, et al. “Blind Face Restoration under Extreme Conditions: Leveraging 3D-2D Prior Fusion for Superior Structural and Texture Recovery.” Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 38. No. 2. 2024.

[6]. Zhu, Feida, et al. “Blind face restoration via integrating face shape and generative priors.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022.

[7]. Yang, Tao, et al. “Gan prior embedded network for blind face restoration in the wild.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021.

[8]. Saharia, Chitwan, et al. “Image super-resolution via iterative refinement.” IEEE transactions on pattern analysis and machine intelligence 45.4 (2022): 4713-4726.

[9]. Zhu, Yuanzhi, et al. “Denoising diffusion models for plug-and-play image restoration.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.

Prerequisites:

Python and PyTorch.

Type of Work:

MS semester project.

80% research, 20% development

Supervisor:

Liying Lu (liying.lu@epfl.ch)

Startup company Innoview has developed arrangements of lenslets that can be used to create document security features. The goal is to verify to which extent a software simulator like Blender is able to faithfully simulate the behavior of light interacting with such lenslets.

Deliverables: Report and running prototype (Matlab). Blender lenslet simulations.

Prerequisites:

– knowledge of image processing / computer vision

– basic coding skills in Matlab

Level: BS or MS semester project

Supervisors:

Prof. Roger D. Hersch, BC110, rd.hersch@epfl.ch, cell: 077 406 27

Dr Romain Rossier, Innoview Sàrl, romain.rossier@innoview.ch, , tel

078 664 36 44

Startup company Innoview has developed new moiré features that can prevent counterfeits. Some types of moiré features rely on grayscale images. The present project aims at creating a grayscale image editor. Designers should be able to shape their grayscale image by various means (interpolation between spatially defined grayscale values, geometric transformations, image warping, etc…).

Deliverables: Report and running prototype (Matlab). Blender lenslet simulations.

Prerequisites:

– knowledge of image processing / computer vision

– coding skills in Matlab

Level: BS or MS semester project

Supervisors:

Prof. Roger D. Hersch, BC110, rd.hersch@epfl.ch, cell: 077 406 27

Dr Romain Rossier, Innoview Sàrl, romain.rossier@innoview.ch, , tel

078 664 36 44

Startup company Innoview Sàrl has developed software to recover a message hidden into patterns. Appropriate settings of parameters enable the detection of counterfeits. The goal of the project is to define optimal parameters for different sets of printing conditions (resolution, type of paper, type printing device, complexity of hidden watermark, etc..). The project involves tests on a large data set and appropriate statistics.

Deliverables: Report and running prototype (Android, Matlab).

Prerequisites:

– knowledge of image processing / computer vision
– basic coding skills in Matlab and/or Java Android

Level: BS or MS semester project

Supervisors:

Dr Romain Rossier, Innoview Sàrl, romain.rossier@innoview.ch, , tel 078 664 36 44
Prof. Roger D. Hersch, BC110, rd.hersch@epfl.ch, cell: 077 406 27 09

This project aims to explore whether there is any semantic information encoded by off-the-shelf diffusion model that helps us and other deep learning models understand what is the content of an image or the relationship between images.

Diffusion models [1] have been the new paradigm for generative modeling in computer vision. Despite its success, it remains to be a black box during generation. At each step, it provides a direction, namely the score, towards the data distribution. As shown in recent work [2], the score can be decomposed into different meaningful components. The first research question is: does the score encode any semantic information of the generated image?

Moreover, there is evidence that the representation learned by diffusion models is helpful to discriminative models. For example, it can boost the classification performance by knowledge distillation [3]. Furthermore, diffusion model itself can be used as a robust classifier [4]. It can be seen that discriminative information can be extracted from the diffusion model. Then the second question is: What is the information about? Is it about the object shape? Location? Texture? Or other kinds of information.

This is an exploratory project. We will try to interpret the black box in diffusion model and dig semantic information that it encodes. Together, we will also brainstorm the application of diffusion model other than image generation. This project can be a good chance for you to develop interest and skills in scientific research.

References:

[1] Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models[J]. Advances in neural information processing systems, 2020, 33: 6840-6851.

[2] Alldieck T, Kolotouros N, Sminchisescu C. Score Distillation Sampling with Learned Manifold Corrective[J]. arXiv preprint arXiv:2401.05293, 2024.

[3] Yang X, Wang X. Diffusion model as representation learner[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023: 18938-18949.

[4] Chen H, Dong Y, Shao S, et al. Your diffusion model is secretly a certifiably robust classifier[J]. arXiv preprint arXiv:2402.02316, 2024.

Deliverables: Deliverables should include code, well cleaned up and easily reproducible, as well as a written report, explaining the models, the steps taken for the project and the results.

Prerequisites: Python and PyTorch. Basic understanding of diffusion models.

Level: MS research project

Number of students: 1

Contact: Yitao Xu, yitao.xu@epfl.ch