Semester projects are open to EPFL students.
The aim of this project is to use GAN models such as StyleGAN2, and VQGAN alongside supervision from other models such as CLIP or semantic segmentation networks to perform high-level image editing. For example, changing a face image according to a text query like “Blue hair” or according to a semantic segmentation mask.
Deliverables
-
Code, well cleaned up and easily reproducible.
-
Written Report, explaining the literature and steps taken for the project.
Prerequisites
-
Python and PyTorch. Can be a plus: experience with deep image manipulation/generation, some experience with natural language processing tools.
Type of work
50% research, 50% implementation.
References
CLIP: Connecting Text and Images
Optimizing Latent Space Directions For GAN-based Local Image Editing
Supervisor
Ehsan Pajouheshgar
Description:
Fully supervised segmentation methods require pixel-wise annotations which require extensive human effort and time. When the labels are not available, the performance of these segmentation methods drop drastically. Our aim is to incorporate unsupervised algorithms (GrabCut, SLIC etc.) and/or matrix factorization methods in a deep learning model to improve segmentation performance when segmentation annotations are unavailable.
References:
[1] Achanta, Radhakrishna, et al. “SLIC superpixels compared to state-of-the-art superpixel methods.” IEEE transactions on pattern analysis and machine intelligence 34.11 (2012): 2274-2282.
[2] Rother, Carsten, Vladimir Kolmogorov, and Andrew Blake. “” GrabCut” interactive foreground extraction using iterated graph cuts.” ACM transactions on graphics (TOG) 23.3 (2004): 309-314.
[3] B. Zhou, A. Khosla, L. A., A. Oliva, and A. Torralba, “Learning Deep Features for Discriminative Localization.” CVPR, 2016.
Deliverables:
Report, annotations and running prototype (Python).
Prerequisites:
– Knowledge of image processing
– Basic coding skills in Python and basic deep learning knowledge
Level:
MS semester project
Supervisor: Baran Ozaydin ([email protected])
Level
B.Sc. or M.Sc. adjustable, multiple openings
Description
This semester project is part of a larger project to build the next generation vision for UAVs, enabling computer vision solutions on drones.
For B.Sc. students: your project’s contribution is data centric, between data acquisition using our UAV setup, processing, and analysis. By the end of the semester, your learning outcomes would be acquaintance with drone data acquisition and (image) information extraction, data organization and analysis.
For M.Sc. students: your project can be (1) similar to the B.Sc. project with more advanced data science work, OR (2) machine learning based, more particularly in computer vision. In the first case, you add to your learning outcomes experience in computational imaging, and in the second case experience in state-of-the-art computer vision and machine learning methods.
Publications can be expected for some of the projects depending on your outcomes, even for the B.Sc. students. In such a case we would help you through the entire process.
For further details, please email the supervisors because we cannot make more information public for the moment.
Type of Work (e.g., theory, programming)
Variable, from 80/20% on development/research to 20/80%, depending on the project and on the seniority of the student (B.Sc./M.Sc.).
Prerequisites
We will guide you in regular meetings to learn as you progress. However, one essential aspect is thoroughness. For B.Sc. some experience with data processing and working on Linux terminals would be useful. For M.Sc. (project 1) image processing experience, and (project 2) computer vision experience would be helpful.
Supervisors
Baran – IVRL PhD ([email protected])
Vidit – CVLab PhD ([email protected])
Majed – IVRL Postdoc ([email protected])
The biggest challenge in using neural networks to manipulate images for professional cinema production is temporal stability – as no flickering or glitches can be tolerated.
Swiss production company and post-production house 8horses has developed an experimental proof-of-concept model to translate the color characteristics of photochemical 35mm film to footage acquired with digital cinema cameras. In the PoC model good temporal stability has been achieved by combining a cycleGAN architecture and unpaired training with a temporal loss function comparing only one previous frame.
The proof-of-concept has been trained on a dataset of aligned images of 35mm film and footage from an Arri Alexa digital camera from a research project by ZHdK.
The goal is to further refine the existing model, understand the mechanism leading to temporal stability and possibly improve it. The feature film project ELECTRIC CHILD by 8horses can be used as a case-study for the project. During the shooting of the film still frames can be shot on 35mm film to create a new dataset to specifically fine-tune the model for the scenes of the film
Tasks
- Further refine the proof-of concept.
- Research and improve the mechanism leading to temporal stability
- Create a fine-tuning dataset with images from the case-study feature film and test it on the film in a professional post-production environment.
Deliverables
- Code, well cleaned up and easily reproducible.
Prerequisites
- Experience with Python and PyTorch for deep learning
- Experience and knowledge of deep learning and computer vision
Type of work:
20% research, 80% development and test.
Supervisors:
Berk Dogan ([email protected]) and Simon Jaquemet ([email protected])
References:
Proof-of-concept demo:
https://cloud.8horses.ch/s/RHtRdyZqqZY7ga2
https://www.zhdk.ch/forschungsprojekt/analog–digital-426752
https://github.com/NVIDIA/pix2pixHD
Description:
Startup company Innoview Sàrl has developed software to recover by smartphone a watermark hidden into a grayscale image that uses halftones to display simple graphical elements such as a logo. Now the software has been extended to hide the watermark within graphical elements. Adapt this software to work within an Android smartphone. Tune and optimize the available parameters.
Deliverables: Report and running prototype (Matlab and/or Android).
Prerequisites:
– knowledge of image processing / computer vision
– basic coding skills in Matlab and Java Android
Level: BS or MS semester project or possibly master project
Supervisors:
Dr Romain Rossier, Innoview Sàrl, [email protected], , tel 078 664 36 44
Prof. Roger D. Hersch, INM034, [email protected], cell: 077 406 27 09
Description:
Startup company Innoview Sàrl has developed software to recover by smartphone a watermark hidden within a printed image. Dedicated watermark synthesizing software has been developed in Matlab. The goal of the project aims at translating that software to C# and at testing it performances.
Deliverables: Report and running prototype (C#).
Prerequisites:
– basic coding skills in Matlab and C# (or Java)
Level: BS or MS semester project
Supervisors:
Dr Romain Rossier, Innoview Sàrl, [email protected], , tel 078 664 36 44
Prof. Roger D. Hersch, INM034, [email protected], cell: 077 406 27 09
Description:
Startup company Innoview Sàrl has developed software to recover by smartphone a hidden watermark printed on a desktop Epson printer. Special Epson P50 printer driver software enables printing the hidden watermark. That Epson P50 printer is now replaced by new types of Epson printers that require a modified driver software. In a previous project, parts of the Epson P50 printer driver commands have been adapted for the new types of Epson printers. The project consists in finalizing the adaptation of the remaining Epson printing commands according to the new Epson printer programming guide. Some reverse engineering may be necessary to obtain non-documented driver commands.
Deliverables: Report and running prototype (C, C++).
Prerequisites:
– knowledge of image processing
– basic coding skills in C, C++
Level: BS or MS semester project
Supervisors:
Dr Romain Rossier, Innoview Sàrl, [email protected], , tel 078 664 36 44
Prof. Roger D. Hersch, INM034, [email protected], cell: 077 406 27 09
Description:
Level-line moiré enable creating interesting dynamically beating shapes such as faces, graphical designs, and landscapes. The project aims at creating level-line moirés as 3D graphical objects defined by meshes. The resulting moirés can be simulated by Blender. They can also be fabricated by a 3D printer.
References:
- Chosson, R. D. Hersch, Beating Shapes Relying on Moiré Level Lines, ACM Transactions on Graphics (TOG), Vol. 34 No. 1, November 2014, Article No. 9, 1-10
http://dx.doi.org/10.1145/2644806
https://www.epfl.ch/labs/lsp/technologies/page-88156-en-html/
Deliverables: Report, possibly 3D printed objects.
Prerequisits:
– basics of computer graphics/image processing
– coding skills in Matlab
Level: BS or MS semester project
Supervisor:
Prof. hon. Roger D. Hersch, BC 110, [email protected], cell: 077 406 27 09
Description (Master Semester Project or Master Thesis Project open to EPFL students)
In this project, you will research the existing literature on monocular depth estimation where Surface Normals or Fourier transforms can be used to predict the depths of single images. You will then build a transformer based model for estimating the depth maps for such images. Traditionally, there have been a lot of single-view depth estimation techniques that have used phase information from Fourier transforms of an image/ surface normals of such images and other geometrical cues. Exploring these concepts to ultimately, predict image depths would be your goal.
Bonus: To improve the cross attention mechanism in transformers.
You may contact the supervisor at any time should you want to discuss the idea further.
Reference
[1] Single-Image Depth Estimation Based on Fourier Domain Analysis; Jae-Han Lee, Minhyeok Heo, Kyung-Rae Kim, and Chang-Su Kim, CVPR 2018.
Type of Work (e.g., theory, programming)
50% research, 50% development and testing
Prerequisites
Experience in deep learning, experience in Python, Pytorch. Experience in statistical analysis to report the performance evaluations of the models.
Models will run on RunAI. (We will guide you how to use RunAI- no prior knowledge required).
Supervisor(s)
Deblina BHATTACHARJEE ([email protected])
We consider the task of creating a 3-d model of a large novel environment, given only a small number of images of the scene. This is a difficult problem, because if the images are taken from very different viewpoints or if they contain similar-looking structures, then most geometric reconstruction methods will have great difficulty finding good correspondences. Further, the reconstructions given by most algorithms include only points in 3-d that were observed in two or more images; a point observed only in a single image would not be reconstructed.
How monocular image cues can be combined with triangulation cues to build a photo-realistic model of a scene given only a few images—even ones taken from very different viewpoints or with little overlap? We use this as our research statement.
You may contact the supervisor at any time should you want to discuss the idea further.
Reference
[1] 3D Reconstruction From Monocular Images Based on Deep Convolutional Networks, Y. Ren et.al.
[2] http://www.robotics.stanford.edu/~ang/papers/iccvvrml07-3dfromsparseviews.pdf
Type of Work (e.g., theory, programming)
50% research, 50% development and testing
Prerequisites
Experience in deep learning, experience in Python, Pytorch. Experience in statistical analysis to report the performance evaluations of the models.
Models will run on RunAI. (We will guide you how to use RunAI- no prior knowledge required).
Supervisor(s)
Deblina BHATTACHARJEE ([email protected])
Description
Deep neural network is shown vulnerable against adversarial attacks, which indicates some unsatisfying properties of its decision boundaries. The imperceptible but well-designed perturbation of the input can lead to dramatic changes of the output. On the other hand, generative adversarial network (GAN) has been shown a powerful framework to learn a generator to fit an unknown distribution by solving a min-max optimization problem. The generator of the GAN is a network transforming a prior distribution, usually a multi-variable uniform or Gaussian distribution to the target distribution. While effective, training a GAN is shown quite tricky in practice and the convergence of many existing algorithms is not theoretically guaranteed.
In this project, we focus on the robustness of the generator in the GAN. That is, we study the “worst-case” “possible” outputs of the generator. Mathematically, given the prior distribution U whose pdf function is f, the generator G and a constant C, we would like to find the input x satisfying f(x) > C and the output of the generator G(x) is of the worst quality. Correspondingly, we will explore the algorithms to improve the robustness of GAN, which means improving the worst-case outputs of the generator.
Tasks
This project has three challenges: 1) A quantitative and reliable metric to define the “quality “of the output. 2) An algorithm to find the worst-case input of the generator. 3) A method to improve the current training algorithms of GAN to improve its robustness.
Deliverables
Algorithms to attack and defend the GAN.
Type of work
20% Literature review, 60% Research, 20% implementation.
Prerequisites
Mathematic Foundations (Calculus, Linear algebra), Machine Learning, Optimization, Programming (Python, PyTorch)
References
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … & Bengio, Y. Generative adversarial nets. NIPS 2014.
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2017). Towards deep learning models resistant to adversarial attacks. ICLR 2018.
Level:
MS Thesis Project / MS Semester Project
Contact:
Chen Liu ([email protected]), Tong Zhang ([email protected])
Description
Deep neural network is shown vulnerable against adversarial attacks, which indicates some unsatisfying properties of its decision boundaries. The imperceptible but well-designed perturbation of the input can lead to dramatic changes of the output. There are many kinds of imperceptible perturbations, such as ones bounded by L-infinity, L-2, L-1 and L-0 norms. The first two examples are better studied while the other two examples are more challenging. Due to the convexity of L-infinity and L-2 norms, projected gradient descent (PGD) can typically obtain consistent and satisfying results in these cases. However, the performance of PGD degrades significantly in L-1 and L-0 cases, which leaves much room for improvement.
This project focuses on the adversarial attacks and defenses where the perturbations are bounded by L-1 or L-0 norms. We explore algorithms to generate such sparse perturbations and training methods to obtain neural network models resistant to such perturbations.
Tasks
This project has two parts:
1) Efficient algorithm to generate sparse attacks.
2) Effective training methods to protect the neural network models against such attacks. Generally, L-0 case is more challenging than L-1 case.
Deliverables
Attack and defense algorithms against sparse attacks.
Type of work
20% Literature review, 60% Research, 20% implementation.
Prerequisites
Mathematic Foundations (Calculus, Linear algebra), Machine Learning, Optimization, Programming (Python, PyTorch)
Reference
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2017). Towards deep learning models resistant to adversarial attacks. ICLR 2018.
Modas, A., Moosavi-Dezfooli, S. M., & Frossard, P.. Sparsefool: a few pixels make a big difference. CVPR 2019.
Su, J., Vargas, D. V., & Sakurai, K. (2019). One pixel attack for fooling deep neural networks. IEEE Transactions on Evolutionary Computation, 23(5), 828-841
Level: MS Semester Project
Contact: Chen Liu ([email protected])
Description:
Image Retargeting aims to generate image with different sizes but keep the semantic and low-level image information. Researchers have successfully used GAN on image retargeting such as InGAN and SinGAN. However, the current methods can be only applied to single image and contain few semantic meanings. With the rapid progress of GAN, such as StyleGAN and its enhanced version (StyleGAN2,3), researchers in this area have found new ways of manipulating the semantic meaning.
In this project, we will use the StyleGAN or GAN on image retargeting. We firstly investigate how to control semantics on StyleGAN. Then we will explore the relationship between image and its local patches’ representations. Finally, we will try to generate image with different sizes while keep semantic meanings unchanged. Note that, the goal of our project may change slightly since there will be more papers coming out in the next few months, and we may modify our goal based on the recent findings.
Tasks:
- Literature review and learn to train GANs, style-GANs-2 and 3.
- Test the image manipulation.
- Discover the underlying semantic latent information
- Propose methods to infer the semantic meaning on latent space of Style-GAN
Prerequisites:
Having knowledge on python, deep learning framework (tensorflow or pytorch), and linear algebra is required.
Level:
MS project or thesis
Type of work:
20% literature review, 40% research, 40% development and test.
Reference:
[1] Karras T, Laine S, Aittala M, Hellsten J, Lehtinen J, Aila T. Analyzing and improving the image quality of stylegan. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2020 (pp. 8110-8119).
[2]. Abdal R, Qin Y, Wonka P. Image2stylegan: How to embed images into the stylegan latent space?. InProceedings of the IEEE/CVF International Conference on Computer Vision 2019 (pp. 4432-4441).
[3]. Karras T, Aittala M, Laine S, Hellsten J, Lehtinen J, Aila T Alia-Free Generative Adversarial Networks.
[4]. Karras, Tero, Samuli Laine, and Timo Aila. “A style-based generator architecture for generative adversarial networks.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019
[5]. Assaf Shocher, Shai Bagon, Phillip Isola, Michal Irani. InGAN: Capturing and Remapping the “DNA” of a Natural Image
[6]. Tamar Rott Shaham Tali Dekel Tomer Michaeli. SinGAN: Learning a Generative Model from a Single Natural Image”
Supervisor: Tong Zhang([email protected])
Description:
Visual saliency refers a part in a scene that captures our attention. Current approaches for saliency estimation use eye tracking data on natural images for constructing ground truth. However, in our project we will perform eye tracking on comics pages instead of natural images. Later, we will use the collected data to estimate saliency in comics domain. In this project, you will work on an eye tracking experiment with mobile eye tracking glasses.
Tasks:
– Understand the key points of an eye tracking experiment and our setup.
– Conduct an eye tracking experiment according to given instructions.
Deliverables: At the end of the semester, the student should provide the collected data and a report of the work.
Type of work: 20% research, 80% development and testing
References:
[1] A. Borji and L. Itti, “Cat2000: A large scale fixation dataset for boosting saliency research,” CVPR 2015 workshop on ”Future of Datasets”, 2015.
[2] Kai Kunze , Yuzuko Utsumi , Yuki Shiga , Koichi Kise , Andreas Bulling, I know what you are reading: recognition of document types using mobile eye tracking, Proceedings of the 2013 International Symposium on Wearable Computers, September 08-12, 2013, Zurich, Switzerland.
[3] K. Khetarpal and E. Jain, “A preliminary benchmark of four saliency algorithms on comic art,” 2016 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), Seattle, WA.
Level: BS semester project
Supervisor: Bahar Aydemir ([email protected])
Description:
Face detection is identifying human faces in natural images. Convolutional neural networks and deep neural networks have proved their effectiveness on detecting faces. However, performance of these approaches drop significantly on artistic images such as drawings, paintings and illustrations due to the limited training data in these domains.
In this project, we will perform face detection on comics characters. These faces differ from natural human faces due to the artistic interpretation of the authors and the fantastic nature of the characters. Therefore, we will use transfer learning and domain adaptation techniques to extract and translate facial information between different domains.
Tasks:
– Understand the literature and state-of-art
– Test several face detection algorithms on comics
– Develop a method to detect faces of different characters’ faces from multiple artistic styles
– Compare the performances of existing state-of-art face detection algorithms and our method
Prerequisites:
Experience in machine learning and computer vision, experience in Python, experience in deep learning frameworks
Deliverables:
At the end of the semester, the student should provide a framework that provides the face detection and a report of the work.
Level:
MS semester or thesis project
Type of work:
65% research, 35% development and testing
References:
[1] X. Qin, Y. Zhou, Z. He, Y. Wang and Z. Tang, “A Faster R-CNN Based Method for Comic Characters Face Detection,” 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, 2017, pp. 1074-1080, doi: 10.1109/ICDAR.2017.178.
[2] N. Inoue, R. Furuta, T. Yamasaki, K. Aizawa, Cross-domain weakly-supervised object detection through progressive domain adaptation, arXiv:1803.11365 (2018).
[3] W. Sun, J. Burie, J. Ogier and K. Kise, “Specific Comic Character Detection Using Local Feature Matching,” 2013 12th International Conference on Document Analysis and Recognition, Washington, DC, 2013, pp. 275-279, doi: 10.1109/ICDAR.2013.62.
Description:
Available self- and weakly-supervised segmentation methods still have a performance gap with the fully supervised methods. Our aim is to improve the semantic segmentation performance when pixel-wise labels are not available by analyzing the deep features extracted by fully-, weakly- and self- supervised methods. When the label dependency of segmentation models is relaxed, we will test our models in various domains and try to improve domain adaptation performance.
References:
[1] Zhao, Nanxuan, et al. “What makes instance discrimination good for transfer learning?.” arXiv preprint arXiv:2006.06606 (2020).
[2] B. Zhou, A. Khosla, L. A., A. Oliva, and A. Torralba, “Learning Deep Features for Discriminative Localization.” CVPR, 2016.
[3] Chen, Xinlei, and Kaiming He. “Exploring simple siamese representation learning.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.
Deliverables: Report and reproducable implementations
Prerequisites: Experience with deep learning, Pytorch, computer vision
Level: MS semester project
Type of work: 60% research, 40% implementation
Supervisors: Baran Ozaydin ([email protected])