Semester projects are open to EPFL students.
Description:
DeepFakes are subsets of fake images (especially face images) synthesized with Machine Learning algorithms. First appearing only four years ago, DeepFake generation technology has been evolving in several aspects: quality, speed, and data efficiency. As facial expression and identity are the central parts of social interaction and trust for media, there is a persistent interest in developing robust and efficient DeepFake detection algorithms.
DeepFake videos are DeepFake images’ video counterparts. Moving from images to videos opens up several new possibilities (as what we witnessed in image understanding to video understanding). For instance, video as a time series data offers consistency between frames, but faces in DeepFake videos might not move as usual as in real videos.
In this project, you will first briefly review the state-of-the-art DeepFake video detection methods. Then based on your review, you will reproduce/re-implement a representative subset of them and test them on the FaceForensic++ dataset. Finally, your result should come with quantitative and qualitative comparisons.
For inquiries, feel free to drop an email : )
Level of work
- Senior Bachelor / MS Level, semester project / master project
Prerequisite:
- Knowledge in deep learning frameworks (e.g., PyTorch or Tensorflow), image processing and Computer Vision
Supervisor:
- Yufan Ren, [email protected]
Type of work:
- 50% research, 50% development
Deliverables:
- Code, well cleaned up and easily reproducible
- Written Report, explaining the literature and steps taken for the project
References:
[1] FaceForensics++: Learning to Detect Manipulated Facial Images
[2] Deepfake Video Detection through Optical Flow Based CNN
[3] Countering Malicious DeepFakes: Survey, Battleground, and Horizon
Neural Rendering is a branch of rendering technique that replaces one or more parts of the rendering pipeline with neural networks. The inductive bias of neural networks has been proven helpful in many Neural Rendering tasks, such as novel view synthesis (NeRF), surface reconstruction (volSDF), and material acquisition (NeRD).
In this project, we are interested in the generalization ability of Neural Rendering, e.g., given a set of images, infer novel views directly. There are several benefits of using generalizable features. Firstly, we skip the long training procedure with a fast-forward inference. Secondly, learnable features are beneficial in sparse input cases. Thirdly, the framework enables further optimization to improve quality.
Level
MS Level: semester project / master project
Prerequisite:
Knowledge in deep learning frameworks (e.g., PyTorch or Tensorflow), image processing, and Computer Vision.
Supervisors:
Type of work:
50% research, 50% development
References:
[1]
[2]
Level
B.Sc. Semester Project
Description
This semester project is part of a larger project to build the next-generation vision for UAVs, enabling computer vision solutions on drones.
Our aim is to extract semantic segmentation labels for aerial images using UAV metadata (GPS, height, velocity). However, noisy metadata and perspective distortion hinder perfect alignment between segmentation labels and aerial images. Your task is to extract segmentation labels and align them with aerial images to create a small test set. Then, you will run deep learning-based benchmarks to assess the performance of fully supervised models on the dataset. You will also compare the acquired dataset with the other available datasets. Having experience with widely used deep learning architectures, excelling in deep learning and data processing libraries and addressing important vision tasks (alignment and segmentation) are among the learning outcomes of the project.
Deliverables
Code, acquired dataset and written report
Type of Work (e.g., theory, programming)
%25 data acquisition, %75 implementation.
Prerequisites
Knowledge in Python and PyTorch, experience in image processing and computer vision. Experience with OpenStreetMap API is a preference.
Supervisors
Baran – IVRL PhD ([email protected])
Description:
Dense semantic correspondence relates pixels belonging to similar objects in two different images to each other. Unlike segmentation task, semantic correspondence requires fine-grained recognition of the object parts. However, most datasets include only the segmentation masks instead of pixel-wise correspondence labels. Our aim is to improve semantic correspondence task without the ground truth correspondence labels. You will be learning core concepts from self- and weakly-supervised learning and have a complete understanding of the features extracted by deep learning models.
References:
[1] Wang, Xinlong, et al. “Dense contrastive learning for self-supervised visual pre-training.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.
[2] B. Zhou, A. Khosla, L. A., A. Oliva, and A. Torralba, “Learning Deep Features for Discriminative Localization.” CVPR, 2016.
[3] Liu, Yanbin, et al. “Semantic correspondence as an optimal transport problem.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.
Deliverables: Report and reproducible implementations
Prerequisites: Experience with deep learning, Pytorch, computer vision, probability and statistics
Level: MS semester project
Type of work: 60% research, 40% implementation
Supervisors: Baran Ozaydin ([email protected])
Description:
Visual saliency refers a part in a scene that captures our attention. Current approaches for saliency estimation use eye tracking data on natural images for constructing ground truth. However, in our project we will perform eye tracking on comics pages instead of natural images. Later, we will use the collected data to estimate saliency in comics domain. In this project, you will work on an eye tracking experiment with mobile eye tracking glasses.
Tasks:
– Understand the key points of an eye tracking experiment and our setup.
– Conduct an eye tracking experiment according to given instructions.
Deliverables: At the end of the semester, the student should provide the collected data and a report of the work.
Type of work: 20% research, 80% development and testing
References:
[1] A. Borji and L. Itti, “Cat2000: A large scale fixation dataset for boosting saliency research,” CVPR 2015 workshop on ”Future of Datasets”, 2015.
[2] Kai Kunze , Yuzuko Utsumi , Yuki Shiga , Koichi Kise , Andreas Bulling, I know what you are reading: recognition of document types using mobile eye tracking, Proceedings of the 2013 International Symposium on Wearable Computers, September 08-12, 2013, Zurich, Switzerland.
[3] K. Khetarpal and E. Jain, “A preliminary benchmark of four saliency algorithms on comic art,” 2016 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), Seattle, WA.
Level: BS semester project
Supervisor: Bahar Aydemir ([email protected])
Description:
We consider the task of creating a 3-d model of a large novel environment, given only a small number of images of the scene. This is a difficult problem, because if the images are taken from very different viewpoints or if they contain similar-looking structures, then most geometric reconstruction methods will have great difficulty finding good correspondences. Further, the reconstructions given by most algorithms include only points in 3-d that were observed in two or more images; a point observed only in a single image would not be reconstructed.
How monocular image cues can be combined with triangulation cues to build a photo-realistic model of a scene given only a few images—even ones taken from very different viewpoints or with little overlap? We use this as our research statement.
You may contact the supervisor at any time should you want to discuss the idea further.
Reference
[1] 3D Reconstruction From Monocular Images Based on Deep Convolutional Networks, Y. Ren et.al.
[2] http://www.robotics.stanford.edu/~ang/papers/iccvvrml07-3dfromsparseviews.pdf
Type of Work (e.g., theory, programming)
50% research, 50% development and testing
Prerequisites
Experience in deep learning, experience in Python, Pytorch. Experience in statistical analysis to report the performance evaluations of the models.
Models will run on RunAI. (We will guide you on how to use RunAI- no prior knowledge required).
Supervisor(s)
Deblina BHATTACHARJEE ([email protected])
Description:
Startup company Innoview Sàrl has developed software to recover by smartphone a watermark hidden into a grayscale image that uses halftones to display simple graphical elements such as a logo. The watermark hiding algorithm can be tuned according to a variety of parameters. Explore the variations of the parameters and derive hiding and recognition metrics.
Deliverables: Report and running prototype (Matlab and/or Android).
Prerequisites:
– knowledge of image processing / computer vision
– basic coding skills in Matlab and Java Android
Level: BS or MS semester project
Supervisors:
Dr Romain Rossier, Innoview Sàrl, [email protected], , tel 078 664 36 44
Prof. Roger D. Hersch, INM034, [email protected], cell: 077 406 27 09
Description:
Level-line moiré enables creating interesting dynamically beating shapes such as faces, graphical designs, and landscapes. The project aims at creating level-line moirés as 3D graphical objects defined by meshes. The resulting moirés can be simulated by Blender. They can also be fabricated by a 3D printer.
References:
- Chosson, R. D. Hersch, Beating Shapes Relying on Moiré Level Lines, ACM Transactions on Graphics (TOG), Vol. 34 No. 1, November 2014, Article No. 9, 1-10
http://dx.doi.org/10.1145/2644806
https://www.epfl.ch/labs/lsp/technologies/page-88156-en-html/
Deliverables: Report, possibly 3D printed objects.
Prerequisites:
– basics of computer graphics/image processing
– coding skills in Matlab
Level: BS or MS semester project
Supervisor:
Prof. hon. Roger D. Hersch, BC 110, [email protected], cell: 077 406 27 09
Description:
Startup company Innoview Sàrl has developed software to recover by smartphone a hidden watermark printed on a desktop Epson printer. Special Epson P50 printer driver software enables printing the hidden watermark. That Epson P50 printer is now replaced by new types of Epson printers that require a modified driver software. In a previous project, parts of the Epson P50 printer driver commands have been adapted for the new types of Epson printers. The project consists in finalizing the adaptation of the remaining Epson printing commands according to the new Epson printer programming guide. Some reverse engineering may be necessary to obtain non-documented driver commands.
Deliverables: Report and running prototype (C, C++).
Prerequisites:
– knowledge of image processing
– basic coding skills in C, C++
Level: BS or MS semester project
Supervisors:
Dr Romain Rossier, Innoview Sàrl, [email protected], , tel 078 664 36 44
Prof. Roger D. Hersch, INM034, [email protected], cell: 077 406 27 09
Description:
Self-supervised Learning (SSL) has been attracting increasing attention around the world. However, different SSL methods yield different performances on different tasks. There are many aspects that resulting those differences, such as augmentation methods, the number of crops, and the pre-training datasets. In this project, we are going to investigate how the pre-training dataset affects the performance of different SSL frameworks and observe the results on transfer learning downstream tasks.
Task:
– Literature review and learn to train MoCo, SimSiam, MAE
– Pretrain the model on different datasets and fine-tune it on transfer learning.
– Analyze the results we obtain.
Prerequisites:
Having solid knowledge on python, deep learning framework (tensorflow or pytorch),
Level:
MS project
Type of work:
10% analyze the results, 30% research, 70% development and test.
Reference:
[1] Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual rep- resentation learning. In Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 9729–9738, 2020
[2]. Xinlei Chen and Kaiming He. Exploring simple siamese rep- resentation learning. In Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition
[3]. Mathilde Caron, Hugo Touvron, Ishan Misra, Herve ́ Je ́gou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerg- ing properties in self-supervised vision transformers.
[4]. Wouter Van Gansbeke, Simon Vandenhende, Stamatios Georgoulis, Luc Van Gool Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations
Supervisor: Tong Zhang ([email protected])
Description:
Since 2021, large-scale text-to-images models, such as DALL·E 2 or CogView, have achieved remarkable success at generating high-quality images from texts. However, those models require a lot of time and memory for training and for inferences.
The aim of this project is to train models of reasonable size to generate images from texts. Your models will use supervision of other pretrained models, such as CLIP, that were pre-trained for image-text similarity matching.
Tasks:
- Literature review on image generation from text [1, 2, 3, 4].
- Implementation and evaluation of models.
References:
[1] Wang, Z., Liu, W., He, Q., Wu, X., & Yi, Z. (2022). CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP. arXiv preprint arXiv:2203.00386, https://arxiv.org/pdf/2203.00386.pdf
[2] Frans, K., Soros, L. B., & Witkowski, O. (2021). Clipdraw: Exploring text-to-drawing synthesis through language-image encoders. arXiv preprint arXiv:2106.14843, https://arxiv.org/pdf/2106.14843.pdf
[3] Tian, Y., & Ha, D. (2021). Modern Evolution Strategies for Creativity: Fitting Concrete Images and Abstract Concepts. arXiv preprint arXiv:2109.08857, https://arxiv.org/pdf/2109.08857.pdf
[4] Schaldenbrand, P., Liu, Z., & Oh, J. (2022). StyleCLIPDraw: Coupling Content and Style in Text-to-Drawing Translation. arXiv preprint arXiv:2202.12362, https://arxiv.org/pdf/2202.12362.pdf
Deliverables:
Code, well cleaned up and easily reproducible.
Written Report, explaining the literature and steps taken for the project and the performances of the models.
Prerequisites: Python, PyTorch. Knowledge of image processing and natural language processing.
Level: MS semester project
Supervisor: Martin Nicolas Everaert (martin.everaert [at] epfl.ch)
Image-based rendering can date back to the 1990s. Unlike traditional Computer Graphics rendering, which requires explicit scene geometry and scene texture, image-based rendering renders a scene based on observations of the scene, i.e., photographs taken in the real/synthesized scene. One image-based rendering is via Neural Rendering, which enables the generation of photorealistic rendering of a 3D scene based on learning a Neural Radiance Field, i.e., NeRF [1]. NeRF can represent a scene via Multilayer Perceptron, through which we query for color and opacity of a 3D location of the scene along a particular camera viewing direction. Now that we have NeRF, A natural follow-up question is, therefore, “what we can do with a learned radiance field.”
In this project, you will extract 2D line drawings from NeRF that convey geometry and semantics information while preserving 3D view consistency. There are existing techniques to generate feature lines out of volumetric data from traditional CG [3] and CV frameworks such as openpose and img2pose to explore.
For inquiries, feel free to drop an email to us : )
Level
● MS Level: semester project / master project
Prerequisite:
● Knowledge in deep learning frameworks (e.g., PyTorch or Tensorflow), image processing and Computer Vision.
● Experience with 3D vision will be a big plus.
Supervisors:
● Yufan Ren, [email protected]
● Dongqing Wang, [email protected]
Type of work:
● 50% research, 50% development
References:
[1] NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
[2] Stylizing 3D Scene via Implicit Representation and HyperNetwork
[3] Suggestive Contours for Conveying Shape
Readings that might be useful:
Optical Models for Direct Volume Rendering
https://courses.cs.duke.edu/spring03/cps296.8/papers/max95opticalModelsForDirectVolumeRendering.pdf
Line drawings from Volume Data
https://gfx.cs.princeton.edu/pubs/Burns_2005_LDF/index.php
Ever more administrative processes are now done online. This opens the potential for a range of new ways to defraud these processes. This project addresses the situation where the administrative process calls for the submission of a scan or photograph of a document, e.g. a passport, national ID, utility bill or bank statement. A classifier is to be investigated which can differentiate between a (digital) photo of the actual document and a (digital) photo of a printout of a digital photo of a document. – we call this process IPI for “Image to Printout to Image”.
Tasks:
1. Perform a comprehensive literature review of potential IPI detection techniques and methodologies with a particular focus on ML/DL based solutions. This should include the review of image classification techniques with similar objectives such as “recapture” detection.
2. Discussion of identified techniques in terms of applicability – in particular with respect to different types of image subject matter and printing technologies (e.g. laser vs. inkjet). The focus should be on IPI of documents containing text only, text superimposed on background graphics, combined text with images.
3. Collation of an IPI training dataset based on existing datasets. Explore the applicability of Data augmentation techniques and transfer learning.
4. Explore and implement diverse ML/DL networks (e.g. deriving from Resnet50) and train them on the task of IPI classification.
5. Assess classifier performance in view of image resolution and quality (sharpness, contrast, colour balance, image composition etc).
6. Identify possible extensions of the ML/DL architectures which particularly address the IPI use-case.
Deliverables:
Report and Code according to the tasks
Prerequisites:
Knowledge of image processing
Basic coding skills in Python and basic deep learning knowledge
Level:
MS semester project
Type of work:
50% research, 50% development and test
Supervisor:
Peter Grönquist ([email protected])
Tasks:
– Understand the literature and state-of-art
– Test existing image classification data augmentation methods for saliency prediction
– Test and improve a data augmentation method proposed for saliency prediction
– Evaluate state-of-the-art saliency detection models on new data
Prerequisites:
Experience in machine learning and computer vision, experience in Python, experience in deep learning frameworks
Deliverables:
Reproducible code and a written report
Level:
MS semester or thesis project
Type of work:
60% research, 40% development and testing
References:
Supervisor:
Bahar Aydemir ([email protected])
Description:
Alan Turing introduced his famous Turing patterns back in 1952 suggesting how reaction-diffusion systems can be a valid model for generating the resulting morphogenesis patterns. A particularly inspiring reaction-diffusion model that stood the test of time is the Gray-Scott model, which shows an extreme variety of behaviors controlled by just a few variables.
On the other hand, Ever since von Neumann introduced Cellular Automata (CA) as models for self-replication, they have captivated researchers’ minds, who observed extremely complex behaviors emerging from very simple rules, such as Conway’s Game of Life.
In this project, we will use Neural Cellular Automata (NCA), a differentiable extension of the original CA, to generate images and patterns. We will guide and supervise the NCA using various pretrained models such as OpenAI CLIP or VGG19 networks.
Deliverables
-
Code, well cleaned up and easily reproducible.
-
Written Report, explaining the literature and steps taken for the project.
Prerequisites
-
Python and PyTorch.
-
Experience with Deep Learning methods and Convolutional Networks
Level: Master Semester Project
Type of work
50% research, 50% implementation.
References
Reaction-Diffusion models: https://www.karlsims.com/rd.html
Self-Organizing Textures: https://distill.pub/selforg/2021/textures/
Growing Neural Cellular Automata: https://distill.pub/2020/growing-ca/
OpenAI CLIP: https://openai.com/blog/clip/
Supervisor: Ehsan Pajouheshgar ([email protected])
Image-based rendering can date back to the 1990s. Unlike traditional Computer Graphics rendering, which requires explicit scene geometry and scene texture, image-based rendering renders a scene based on observations of the scene, i.e., photographs taken in the real/synthesized scene.
Given an image collection of a scene under different viewing directions, the method of NeRF can faithfully synthesize novel views that are 3D-consistent. However, it is still an open question of how to render the scene under a novel lighting condition.
There are a few works about relighting a NeRF-like implicit scene representation. There are two steps for this process: scene decomposition and 3D scene extraction from the decomposition. Afterward, we can relight the 3D representation easily.
In this project, you will read about various papers on relight NeRF, reproduce one or two of which you find interesting, and explore possible ways to improve either step of relighting.
For inquiries, feel free to drop an email to us : )
Level
● Bachelor Level Semester Project; Also possible as a master semester project with a certain level of extension.
Prerequisite:
● Knowledge in deep learning frameworks (e.g., PyTorch or Tensorflow), image processing and Computer Vision.
● Experience with 3D vision will be a big plus.
Supervisor:
● Dongqing Wang, [email protected]
Type of work:
● 40% research, 60% development
Description:
Classification is a common machine-learning task that involves assigning a label for a given input. Common examples of classification come with detecting spam in e-mails, classifying cats and dogs in images or recognizing in videos the main topics/entities (e.g. YouTube-8M) or the actions (e.g. UCF11).
In this project, the goal is to detect whether a given video is an advertisement or not by using such machine-learning and deep-learning approaches.
Tasks:
- Literature review on video classification and understanding [1, 2, 3, 4], on advertising images detection [4, 5].
- Implementation and evaluation of machine-learning/deep-learning models for advertising videos detection.
References:
[1] Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014). Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 1725-1732), https://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Karpathy_Large-scale_Video_Classification_2014_CVPR_paper.pdf.
[2] Harvey, M. (2017) [blog] Five Video Classification Methods Implemented in Keras and TensorFlow. In Coastline Automation, 22 Mar. 2017, https://blog.coast.ai/five-video-classification-methods-implemented-in-keras-and-tensorflow-99cad29cc0b5.
[3] Jing, L., Parag, T., Wu, Z., Tian, Y., & Wang, H. (2021). Videossl: Semi-supervised learning for video classification. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 1110-1119), https://openaccess.thecvf.com/content/WACV2021/papers/Jing_VideoSSL_Semi-Supervised_Learning_for_Video_Classification_WACV_2021_paper.pdf.
[4] Hussain, Z., Zhang, M., Zhang, X., Ye, K., Thomas, C., Agha, Z., … & Kovashka, A. (2017). Automatic understanding of image and video advertisements. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1705-1715), https://openaccess.thecvf.com/content_cvpr_2017/papers/Hussain_Automatic_Understanding_of_CVPR_2017_paper.pdf.
[5] Almgren, K., Krishnan, M., Aljanobi, F., & Lee, J. (2018). AD or Non-AD: A Deep Learning Approach to Detect Advertisements from Magazines. Entropy, 20(12), 982, https://www.mdpi.com/1099-4300/20/12/982/pdf.
Deliverables:
Code, well cleaned up and easily reproducible.
Written report, explaining the models, the steps taken for the project and the performances of the models.
Prerequisites: Python and PyTorch. Knowledge of image processing.
Level: BS semester project
Supervisor: Martin Nicolas Everaert (martin.everaert [at] epfl.ch)
Description
Neural Networks are often treated as black boxes. In this project, we’re going to try to understand how a neural network works by visualizing what each neuron in the network is detecting. We will apply feature visualization techniques to networks trained using self-supervised, unsupervised, or supervised methods.
Deliverables
-
Code, well cleaned up and easily reproducible.
-
Written Report, explaining the literature and steps taken for the project.
Prerequisites
-
Python and PyTorch.
-
Convolutional Neural Networks
-
Experience with Deep Learning methods can be a
Level: Bachelor Semester Project
Type of work
50% research, 50% implementation.
References
Feature Visualization: https://distill.pub/2017/feature-visualization/
Differentiable Image Parameterizations: https://distill.pub/2018/differentiable-parameterizations/
Supervisor: Ehsan Pajouheshgar ([email protected])