Available Projects – Fall 2022 ‒ IVRL ‐ EPFL

Semester projects are open to EPFL students.

Description:

DeepFakes are subsets of fake images (especially face images) synthesized with Machine Learning algorithms. First appearing only four years ago, DeepFake generation technology has been evolving in several aspects: quality, speed, and data efficiency. As facial expression and identity are the central parts of social interaction and trust for media, there is a persistent interest in developing robust and efficient DeepFake detection algorithms.

DeepFake videos are DeepFake images’ video counterparts. Moving from images to videos opens up several new possibilities (as what we witnessed in image understanding to video understanding). For instance, video as a time series data offers consistency between frames, but faces in DeepFake videos might not move as usual as in real videos.

In this project, you will first briefly review the state-of-the-art DeepFake video detection methods. Then based on your review, you will reproduce/re-implement a representative subset of them and test them on the FaceForensic++ dataset. Finally, your result should come with quantitative and qualitative comparisons.

For inquiries, feel free to drop an email : )

Level of work

Senior Bachelor / MS Level, semester project / master project

Prerequisite:

Knowledge in deep learning frameworks (e.g., PyTorch or Tensorflow), image processing and Computer Vision

Supervisor:

Yufan Ren, yufan.ren@epfl.ch

Type of work:

50% research, 50% development

Deliverables:

Code, well cleaned up and easily reproducible
Written Report, explaining the literature and steps taken for the project

References:

[1] FaceForensics++: Learning to Detect Manipulated Facial Images
[2] Deepfake Video Detection through Optical Flow Based CNN
[3] Countering Malicious DeepFakes: Survey, Battleground, and Horizon

Neural Rendering is a branch of rendering technique that replaces one or more parts of the rendering pipeline with neural networks. The inductive bias of neural networks has been proven helpful in many Neural Rendering tasks, such as novel view synthesis (NeRF), surface reconstruction (volSDF), and material acquisition (NeRD).

In this project, we are interested in the generalization ability of Neural Rendering, e.g., given a set of images, infer novel views directly. There are several benefits of using generalizable features. Firstly, we skip the long training procedure with a fast-forward inference. Secondly, learnable features are beneficial in sparse input cases. Thirdly, the framework enables further optimization to improve quality.

Level

MS Level: semester project / master project

Prerequisite:

Knowledge in deep learning frameworks (e.g., PyTorch or Tensorflow), image processing, and Computer Vision.

Supervisors:

yufan.ren@epfl.ch

Type of work:

50% research, 50% development

References:

[1] NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

[2] IBRNet: Learning Multi-View Image-Based Rendering

[3] Stable view synthesishttps://arxiv.org/abs/2011.07233)

Level

B.Sc. Semester Project

Description

This semester project is part of a larger project to build the next-generation vision for UAVs, enabling computer vision solutions on drones.

Our aim is to extract semantic segmentation labels for aerial images using UAV metadata (GPS, height, velocity). However, noisy metadata and perspective distortion hinder perfect alignment between segmentation labels and aerial images. Your task is to extract segmentation labels and align them with aerial images to create a small test set. Then, you will run deep learning-based benchmarks to assess the performance of fully supervised models on the dataset. You will also compare the acquired dataset with the other available datasets. Having experience with widely used deep learning architectures, excelling in deep learning and data processing libraries and addressing important vision tasks (alignment and segmentation) are among the learning outcomes of the project.

Deliverables

Code, acquired dataset and written report

Type of Work (e.g., theory, programming)

%25 data acquisition, %75 implementation.

Prerequisites

Knowledge in Python and PyTorch, experience in image processing and computer vision. Experience with OpenStreetMap API is a preference.

Supervisors

Baran – IVRL PhD (baran.ozaydin@epfl.ch)

Description:

Dense semantic correspondence relates pixels belonging to similar objects in two different images to each other. Unlike segmentation task, semantic correspondence requires fine-grained recognition of the object parts. However, most datasets include only the segmentation masks instead of pixel-wise correspondence labels. Our aim is to improve semantic correspondence task without the ground truth correspondence labels. You will be learning core concepts from self- and weakly-supervised learning and have a complete understanding of the features extracted by deep learning models.

References:
[1] Wang, Xinlong, et al. “Dense contrastive learning for self-supervised visual pre-training.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.
[2] B. Zhou, A. Khosla, L. A., A. Oliva, and A. Torralba, “Learning Deep Features for Discriminative Localization.” CVPR, 2016.
[3] Liu, Yanbin, et al. “Semantic correspondence as an optimal transport problem.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.

Deliverables: Report and reproducible implementations

Prerequisites: Experience with deep learning, Pytorch, computer vision, probability and statistics

Level: MS semester project

Type of work: 60% research, 40% implementation

Supervisors: Baran Ozaydin (baran.ozaydin@epfl.ch)

Description:

Visual saliency refers a part in a scene that captures our attention. Current approaches for saliency estimation use eye tracking data on natural images for constructing ground truth. However, in our project we will perform eye tracking on comics pages instead of natural images. Later, we will use the collected data to estimate saliency in comics domain. In this project, you will work on an eye tracking experiment with mobile eye tracking glasses.

Tasks:
– Understand the key points of an eye tracking experiment and our setup.

– Conduct an eye tracking experiment according to given instructions.

Deliverables: At the end of the semester, the student should provide the collected data and a report of the work.

Type of work: 20% research, 80% development and testing

References:

[1] A. Borji and L. Itti, “Cat2000: A large scale fixation dataset for boosting saliency research,” CVPR 2015 workshop on ”Future of Datasets”, 2015.

[2] Kai Kunze , Yuzuko Utsumi , Yuki Shiga , Koichi Kise , Andreas Bulling, I know what you are reading: recognition of document types using mobile eye tracking, Proceedings of the 2013 International Symposium on Wearable Computers, September 08-12, 2013, Zurich, Switzerland.

[3] K. Khetarpal and E. Jain, “A preliminary benchmark of four saliency algorithms on comic art,” 2016 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), Seattle, WA.

Level: BS semester project

Supervisor: Bahar Aydemir (bahar.aydemir@epfl.ch)

Description:

We consider the task of creating a 3-d model of a large novel environment, given only a small number of images of the scene. This is a difficult problem, because if the images are taken from very different viewpoints or if they contain similar-looking structures, then most geometric reconstruction methods will have great difficulty finding good correspondences. Further, the reconstructions given by most algorithms include only points in 3-d that were observed in two or more images; a point observed only in a single image would not be reconstructed.

How monocular image cues can be combined with triangulation cues to build a photo-realistic model of a scene given only a few images—even ones taken from very different viewpoints or with little overlap? We use this as our research statement.

You may contact the supervisor at any time should you want to discuss the idea further.

Reference

[1] 3D Reconstruction From Monocular Images Based on Deep Convolutional Networks, Y. Ren et.al.

[2] http://www.robotics.stanford.edu/~ang/papers/iccvvrml07-3dfromsparseviews.pdf

Type of Work (e.g., theory, programming)

50% research, 50% development and testing

Prerequisites

Experience in deep learning, experience in Python, Pytorch. Experience in statistical analysis to report the performance evaluations of the models.

Models will run on RunAI. (We will guide you on how to use RunAI- no prior knowledge required).

Supervisor(s)

Deblina BHATTACHARJEE (Deblina.bhattacharjee@epfl.ch)

Description:

Startup company Innoview Sàrl has developed software to recover by smartphone a watermark hidden into a grayscale image that uses halftones to display simple graphical elements such as a logo. The watermark hiding algorithm can be tuned according to a variety of parameters. Explore the variations of the parameters and derive hiding and recognition metrics.

Deliverables: Report and running prototype (Matlab and/or Android).

Prerequisites:

– knowledge of image processing / computer vision

– basic coding skills in Matlab and Java Android

Level: BS or MS semester project

Supervisors:

Dr Romain Rossier, Innoview Sàrl, romain.rossier@innoview.ch, , tel 078 664 36 44

Prof. Roger D. Hersch, INM034, rd.hersch@epfl.ch, cell: 077 406 27 09

Description:

Level-line moiré enables creating interesting dynamically beating shapes such as faces, graphical designs, and landscapes. The project aims at creating level-line moirés as 3D graphical objects defined by meshes. The resulting moirés can be simulated by Blender. They can also be fabricated by a 3D printer.

References:

Chosson, R. D. Hersch, Beating Shapes Relying on Moiré Level Lines, ACM Transactions on Graphics (TOG), Vol. 34 No. 1, November 2014, Article No. 9, 1-10

http://dx.doi.org/10.1145/2644806

https://www.epfl.ch/labs/lsp/technologies/page-88156-en-html/

Deliverables: Report, possibly 3D printed objects.

Prerequisites:

– basics of computer graphics/image processing
– coding skills in Matlab

Level: BS or MS semester project

Supervisor:

Prof. hon. Roger D. Hersch, BC 110, rd.hersch@epfl.ch, cell: 077 406 27 09

Description:

Startup company Innoview Sàrl has developed software to recover by smartphone a hidden watermark printed on a desktop Epson printer. Special Epson P50 printer driver software enables printing the hidden watermark. That Epson P50 printer is now replaced by new types of Epson printers that require a modified driver software. In a previous project, parts of the Epson P50 printer driver commands have been adapted for the new types of Epson printers. The project consists in finalizing the adaptation of the remaining Epson printing commands according to the new Epson printer programming guide. Some reverse engineering may be necessary to obtain non-documented driver commands.

Deliverables: Report and running prototype (C, C++).

Prerequisites:

– knowledge of image processing

– basic coding skills in C, C++

Level: BS or MS semester project

Supervisors:

Dr Romain Rossier, Innoview Sàrl, romain.rossier@innoview.ch, , tel 078 664 36 44

Prof. Roger D. Hersch, INM034, rd.hersch@epfl.ch, cell: 077 406 27 09

Description:

Self-supervised Learning (SSL) has been attracting increasing attention around the world. However, different SSL methods yield different performances on different tasks. There are many aspects that resulting those differences, such as augmentation methods, the number of crops, and the pre-training datasets. In this project, we are going to investigate how the pre-training dataset affects the performance of different SSL frameworks and observe the results on transfer learning downstream tasks.

Task:

– Literature review and learn to train MoCo, SimSiam, MAE

– Pretrain the model on different datasets and fine-tune it on transfer learning.

– Analyze the results we obtain.

Prerequisites:

Having solid knowledge on python, deep learning framework (tensorflow or pytorch),

Level:

MS project

Type of work:

10% analyze the results, 30% research, 70% development and test.

Reference:

[1] Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual rep- resentation learning. In Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 9729–9738, 2020

[2]. Xinlei Chen and Kaiming He. Exploring simple siamese rep- resentation learning. In Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition

[3]. Mathilde Caron, Hugo Touvron, Ishan Misra, Herve ́ Je ́gou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerg- ing properties in self-supervised vision transformers.

[4]. Wouter Van Gansbeke, Simon Vandenhende, Stamatios Georgoulis, Luc Van Gool Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations

Supervisor: Tong Zhang (tong.zhang@epfl.ch)

Description:

Since 2021, large-scale text-to-images models, such as DALL·E 2 or CogView, have achieved remarkable success at generating high-quality images from texts. However, those models require a lot of time and memory for training and for inferences.

The aim of this project is to train models of reasonable size to generate images from texts. Your models will use supervision of other pretrained models, such as CLIP, that were pre-trained for image-text similarity matching.

Tasks:

Literature review on image generation from text [1, 2, 3, 4].
Implementation and evaluation of models.

References:

[1] Wang, Z., Liu, W., He, Q., Wu, X., & Yi, Z. (2022). CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP. arXiv preprint arXiv:2203.00386, https://arxiv.org/pdf/2203.00386.pdf

[2] Frans, K., Soros, L. B., & Witkowski, O. (2021). Clipdraw: Exploring text-to-drawing synthesis through language-image encoders. arXiv preprint arXiv:2106.14843, https://arxiv.org/pdf/2106.14843.pdf

[3] Tian, Y., & Ha, D. (2021). Modern Evolution Strategies for Creativity: Fitting Concrete Images and Abstract Concepts. arXiv preprint arXiv:2109.08857, https://arxiv.org/pdf/2109.08857.pdf

[4] Schaldenbrand, P., Liu, Z., & Oh, J. (2022). StyleCLIPDraw: Coupling Content and Style in Text-to-Drawing Translation. arXiv preprint arXiv:2202.12362, https://arxiv.org/pdf/2202.12362.pdf

Deliverables:

Code, well cleaned up and easily reproducible.

Written Report, explaining the literature and steps taken for the project and the performances of the models.

Prerequisites: Python, PyTorch. Knowledge of image processing and natural language processing.

Level: MS semester project

Supervisor: Martin Nicolas Everaert (martin.everaert [at] epfl.ch)

Image-based rendering can date back to the 1990s. Unlike traditional Computer Graphics rendering, which requires explicit scene geometry and scene texture, image-based rendering renders a scene based on observations of the scene, i.e., photographs taken in the real/synthesized scene. One image-based rendering is via Neural Rendering, which enables the generation of photorealistic rendering of a 3D scene based on learning a Neural Radiance Field, i.e., NeRF [1]. NeRF can represent a scene via Multilayer Perceptron, through which we query for color and opacity of a 3D location of the scene along a particular camera viewing direction. Now that we have NeRF, A natural follow-up question is, therefore, “what we can do with a learned radiance field.”

In this project, you will extract 2D line drawings from NeRF that convey geometry and semantics information while preserving 3D view consistency. There are existing techniques to generate feature lines out of volumetric data from traditional CG [3] and CV frameworks such as openpose and img2pose to explore.

For inquiries, feel free to drop an email to us : )

Level

● MS Level: semester project / master project

Prerequisite:

● Knowledge in deep learning frameworks (e.g., PyTorch or Tensorflow), image processing and Computer Vision.

● Experience with 3D vision will be a big plus.

Supervisors:

● Yufan Ren, yufan.ren@epfl.ch

● Dongqing Wang, dongqing.wang@epfl.ch

Type of work:

● 50% research, 50% development

References:

[1] NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

[2] Stylizing 3D Scene via Implicit Representation and HyperNetwork

[3] Suggestive Contours for Conveying Shape

Readings that might be useful:

Optical Models for Direct Volume Rendering

https://courses.cs.duke.edu/spring03/cps296.8/papers/max95opticalModelsForDirectVolumeRendering.pdf

Line drawings from Volume Data

https://gfx.cs.princeton.edu/pubs/Burns_2005_LDF/index.php

Ever more administrative processes are now done online. This opens the potential for a range of new ways to defraud these processes. This project addresses the situation where the administrative process calls for the submission of a scan or photograph of a document, e.g. a passport, national ID, utility bill or bank statement. A classifier is to be investigated which can differentiate between a (digital) photo of the actual document and a (digital) photo of a printout of a digital photo of a document. – we call this process IPI for “Image to Printout to Image”.

Tasks:

1. Perform a comprehensive literature review of potential IPI detection techniques and methodologies with a particular focus on ML/DL based solutions. This should include the review of image classification techniques with similar objectives such as “recapture” detection.

2. Discussion of identified techniques in terms of applicability – in particular with respect to different types of image subject matter and printing technologies (e.g. laser vs. inkjet). The focus should be on IPI of documents containing text only, text superimposed on background graphics, combined text with images.

3. Collation of an IPI training dataset based on existing datasets. Explore the applicability of Data augmentation techniques and transfer learning.

4. Explore and implement diverse ML/DL networks (e.g. deriving from Resnet50) and train them on the task of IPI classification.

5. Assess classifier performance in view of image resolution and quality (sharpness, contrast, colour balance, image composition etc).

6. Identify possible extensions of the ML/DL architectures which particularly address the IPI use-case.

Deliverables:

Report and Code according to the tasks

Prerequisites:

Knowledge of image processing

Basic coding skills in Python and basic deep learning knowledge

Level:

MS semester project

Type of work:

50% research, 50% development and test

Supervisor:

Peter Grönquist (peter.gronquist@epfl.ch)

Description:

Salient regions are the areas that stand out compared to their surroundings. Saliency prediction aims to predict which areas attract the most attention. However, unlike image classification, the saliency prediction task lacks large-scale annotated datasets. One can use data augmentation to increase the amount of relevant data by slightly modifying the existing samples. For example, horizontally flipping an image doesn’t distort attention patterns. However, changing the color of an object, which is one of the available data augmentation methods for image classification, can change the object’s attractiveness. Therefore, a proper augmentation method should modify the image in an unnoticeable way for the human eye.

In this project, you will explore the existing data augmentation methods for image classification on saliency prediction task. You will test and improve a data augmentation method proposed for saliency prediction. Then, you will use this data to evaluate state-of-the-art saliency detection models.

Tasks:

– Understand the literature and state-of-art

– Test existing image classification data augmentation methods for saliency prediction

– Test and improve a data augmentation method proposed for saliency prediction

– Evaluate state-of-the-art saliency detection models on new data

Prerequisites:

Experience in machine learning and computer vision, experience in Python, experience in deep learning frameworks

Deliverables:

Reproducible code and a written report

Level:

MS semester or thesis project

Type of work:

60% research, 40% development and testing

References:

This paper might be very helpful:

Z. Che, A. Borji, G. Zhai, X. Min, G. Guo and P. Le Callet, “How is Gaze Influenced by Image Transformations? Dataset and Model,” in IEEE Transactions on Image Processing, vol. 29, pp. 2287-2300, 2020, doi: 10.1109/TIP.2019.2945857.

These references use saliency as a supervision for augmentation:

Daniel V. Ruiz and Bruno A. Krinski and Eduardo Todt, IDA: Improved Data Augmentation Applied to Salient Object Detection, 2020 SIBGRAPI

A F M Shahab Uddin and Mst. Sirazam Monira and Wheemyung Shin and TaeChoong Chung and Sung-Ho Bae, SaliencyMix: A Saliency Guided Data Augmentation Strategy for Better Regularization, 2021 ICLR

Supervisor:

Bahar Aydemir (bahar.aydemir@epfl.ch)

Description:

Alan Turing introduced his famous Turing patterns back in 1952 suggesting how reaction-diffusion systems can be a valid model for generating the resulting morphogenesis patterns. A particularly inspiring reaction-diffusion model that stood the test of time is the Gray-Scott model, which shows an extreme variety of behaviors controlled by just a few variables.

On the other hand, Ever since von Neumann introduced Cellular Automata (CA) as models for self-replication, they have captivated researchers’ minds, who observed extremely complex behaviors emerging from very simple rules, such as Conway’s Game of Life.

In this project, we will use Neural Cellular Automata (NCA), a differentiable extension of the original CA, to generate images and patterns. We will guide and supervise the NCA using various pretrained models such as OpenAI CLIP or VGG19 networks.

Deliverables

Code, well cleaned up and easily reproducible.
Written Report, explaining the literature and steps taken for the project.

Prerequisites

Python and PyTorch.
Experience with Deep Learning methods and Convolutional Networks

Level: Master Semester Project

Type of work

50% research, 50% implementation.

References

Reaction-Diffusion models: https://www.karlsims.com/rd.html

Self-Organizing Textures: https://distill.pub/selforg/2021/textures/

Growing Neural Cellular Automata: https://distill.pub/2020/growing-ca/

OpenAI CLIP: https://openai.com/blog/clip/

Supervisor: Ehsan Pajouheshgar (ehsan.pajouheshgar@epfl.ch)

Given an image collection of a scene under different viewing directions, the method of NeRF can faithfully synthesize novel views that are 3D-consistent. However, it is still an open question of how to render the scene under a novel lighting condition.

There are a few works about relighting a NeRF-like implicit scene representation. There are two steps for this process: scene decomposition and 3D scene extraction from the decomposition. Afterward, we can relight the 3D representation easily.

In this project, you will read about various papers on relight NeRF, reproduce one or two of which you find interesting, and explore possible ways to improve either step of relighting.

For inquiries, feel free to drop an email to us : )

Level

● Bachelor Level Semester Project; Also possible as a master semester project with a certain level of extension.

Prerequisite:

● Knowledge in deep learning frameworks (e.g., PyTorch or Tensorflow), image processing and Computer Vision.

● Experience with 3D vision will be a big plus.

Supervisor:

● Dongqing Wang, dongqing.wang@epfl.ch

Type of work:

● 40% research, 60% development

Description:

Classification is a common machine-learning task that involves assigning a label for a given input. Common examples of classification come with detecting spam in e-mails, classifying cats and dogs in images or recognizing in videos the main topics/entities (e.g. YouTube-8M) or the actions (e.g. UCF11).

In this project, the goal is to detect whether a given video is an advertisement or not by using such machine-learning and deep-learning approaches.

Tasks:

Literature review on video classification and understanding [1, 2, 3, 4], on advertising images detection [4, 5].
Implementation and evaluation of machine-learning/deep-learning models for advertising videos detection.

References:

[1] Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014). Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 1725-1732), https://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Karpathy_Large-scale_Video_Classification_2014_CVPR_paper.pdf.

[2] Harvey, M. (2017) [blog] Five Video Classification Methods Implemented in Keras and TensorFlow. In Coastline Automation, 22 Mar. 2017, https://blog.coast.ai/five-video-classification-methods-implemented-in-keras-and-tensorflow-99cad29cc0b5.

[3] Jing, L., Parag, T., Wu, Z., Tian, Y., & Wang, H. (2021). Videossl: Semi-supervised learning for video classification. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 1110-1119), https://openaccess.thecvf.com/content/WACV2021/papers/Jing_VideoSSL_Semi-Supervised_Learning_for_Video_Classification_WACV_2021_paper.pdf.

[4] Hussain, Z., Zhang, M., Zhang, X., Ye, K., Thomas, C., Agha, Z., … & Kovashka, A. (2017). Automatic understanding of image and video advertisements. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1705-1715), https://openaccess.thecvf.com/content_cvpr_2017/papers/Hussain_Automatic_Understanding_of_CVPR_2017_paper.pdf.

[5] Almgren, K., Krishnan, M., Aljanobi, F., & Lee, J. (2018). AD or Non-AD: A Deep Learning Approach to Detect Advertisements from Magazines. Entropy, 20(12), 982, https://www.mdpi.com/1099-4300/20/12/982/pdf.

Deliverables:

Code, well cleaned up and easily reproducible.

Written report, explaining the models, the steps taken for the project and the performances of the models.

Prerequisites: Python and PyTorch. Knowledge of image processing.

Level: BS semester project

Supervisor: Martin Nicolas Everaert (martin.everaert [at] epfl.ch)

Description

Neural Networks are often treated as black boxes. In this project, we’re going to try to understand how a neural network works by visualizing what each neuron in the network is detecting. We will apply feature visualization techniques to networks trained using self-supervised, unsupervised, or supervised methods.

Deliverables

Code, well cleaned up and easily reproducible.
Written Report, explaining the literature and steps taken for the project.

Prerequisites

Python and PyTorch.
Convolutional Neural Networks
Experience with Deep Learning methods can be a

Level: Bachelor Semester Project

Type of work

50% research, 50% implementation.

References

Feature Visualization: https://distill.pub/2017/feature-visualization/

Differentiable Image Parameterizations: https://distill.pub/2018/differentiable-parameterizations/

Supervisor: Ehsan Pajouheshgar (ehsan.pajouheshgar@epfl.ch)