Semester projects are open to EPFL students.
Image search with text has been serving as the cornerstone in various real-world applications, such as e-commerce and internet search.
In this project, our goal is to develop an image search algorithm with text information by learning the composite representations that jointly capture visual cues and natural linguistic information to match the best target image of interest. For a given text from the article such as title or keyword, it automatically finds the best image that is the most appealing to the article or creates the most engagement of readers based on image aesthetic, visual appealing, etc.
By bridging between vision and language, which are two important aspects of human intelligence to understand the world, this work can reduce the time journalists spend manually searching for the most suitable images for their articles.
For this project, a large number of database would be available from one of the leading Swiss media group, Ringier AG that published Blick.ch.
Tasks
- Understand the literature and our framework.
- Study and implement an existing state-of-the-art (SOTA) visual-linguistic model.
- Develop a novel image search model with text information.
Deliverables
- Project report and reproducible code
Prerequisites
- Experience with Python and PyTorch for deep learning
- Experience and knowledge of deep learning and computer vision
Type of work
50% research, 50% development
References
[1] Y. Chen, S. Gong, and L. Bazzani, “Image Search with Text Feedback by Visiolinguistic Attention Learning,” CVPR 2020.
[2] A. Radford et al., “Learning Transferable Visual Models From Natural Language Supervision” arXiv 2021.
Level
MS Semester Project, MS Thesis Project or EDIC Fellowship Project
Supervisor(s)
Hakgu Kim ([email protected])
Image aesthetics, or their subjective appeal, and deep image generation are strongly debated and attract increasing attention in state-of-the-art research as well as forefront technologies. The aim of this project is to enhance an image or generate one from scratch, to have the best appeal to the viewer and attract the interest of readers for a given article. The student would have access to a large database in collaboration with one of the leading Swiss media organizations Ringier SA that publish Blick.ch to implement this project.
Deliverables
Code, well cleaned up and easily reproducible.
Written Report, explaining the literature and steps taken for the project.
Prerequisites
Python and PyTorch. Can be a plus: experience with deep image
manipulation/generation, some experience with natural language
processing tools.
Type of work
50% research, 50% implementation.
References
GANalyze: Toward Visual Definitions of Cognitive Image Properties
CLIP: Connecting Text and Images
Level
MS semester project, potentially a MS thesis project or an EDIC
Fellowship project
Supervisors
Ehsan Pajouheshgar, Bahar Aydemir, Majed El Helou
Description:
Startup company Innoview Sàrl has developed software to recover by smartphone a watermark hidden into a grayscale image that uses halftones to display simple graphical elements such as a logo. Now the software has been extended to hide the watermark within graphical elements. Adapt this software to work within an Android smartphone. Tune and optimize the available parameters.
Deliverables: Report and running prototype (Matlab and/or Android).
Prerequisites:
– knowledge of image processing / computer vision
– basic coding skills in Matlab and Java Android
Level: BS or MS semester project or possibly master project
Supervisors:
Dr Romain Rossier, Innoview Sàrl, [email protected], , tel 078 664 36 44
Prof. Roger D. Hersch, INM034, [email protected], cell: 077 406 27 09
Description:
Startup company Innoview Sàrl has developed software to recover by smartphone a watermark hidden within a printed image. The goal of the project is to extend that software in order to recover the watermark from various objects.
Deliverables: Report and running prototype (Matlab and/or Android).
Prerequisites:
– knowledge of image processing / computer vision
– basic coding skills in Matlab
Level: BS or MS semester project or possibly master project
Supervisors:
Dr Romain Rossier, Innoview Sàrl, [email protected], , tel 078 664 36 44
Prof. Roger D. Hersch, INM034, [email protected], cell: 077 406 27 09
Description:
Startup company Innoview Sàrl has developed software to recover by smartphone a watermark hidden within a printed image. Dedicated watermark synthesizing software has been developed in Matlab. The goal of the project aims at translating that software to C# and at testing it performances.
Deliverables: Report and running prototype (C#).
Prerequisites:
– basic coding skills in Matlab and C# (or Java)
Level: BS or MS semester project
Supervisors:
Dr Romain Rossier, Innoview Sàrl, [email protected], , tel 078 664 36 44
Prof. Roger D. Hersch, INM034, [email protected], cell: 077 406 27 09
Description:
Startup company Innoview Sàrl has developed software to recover by smartphone a hidden watermark printed on a desktop Epson printer. Special Epson P50 printer driver software enables printing the hidden watermark. That Epson P50 printer is now replaced by new types of Epson printers that require a modified driver software. In a previous project, parts of the Epson P50 printer driver commands have been adapted for the new types of Epson printers. The project consists in finalizing the adaptation of the remaining Epson printing commands according to the new Epson printer programming guide. Some reverse engineering may be necessary to obtain non-documented driver commands.
Deliverables: Report and running prototype (C, C++).
Prerequisites:
– knowledge of image processing
– basic coding skills in C, C++
Level: BS or MS semester project
Supervisors:
Dr Romain Rossier, Innoview Sàrl, [email protected], , tel 078 664 36 44
Prof. Roger D. Hersch, INM034, [email protected], cell: 077 406 27 09
Description:
Level-line moiré enable creating interesting dynamically beating shapes such as faces, graphical designs, and landscapes. The project aims at creating level-line moirés as 3D graphical objects defined by meshes. The resulting moirés can be simulated by Blender. They can also be fabricated by a 3D printer.
References:
- Chosson, R. D. Hersch, Beating Shapes Relying on Moiré Level Lines, ACM Transactions on Graphics (TOG), Vol. 34 No. 1, November 2014, Article No. 9, 1-10
http://dx.doi.org/10.1145/2644806
https://www.epfl.ch/labs/lsp/technologies/page-88156-en-html/
Deliverables: Report, possibly 3D printed objects.
Prerequisits:
– basics of computer graphics/image processing
– coding skills in Matlab
Level: BS or MS semester project
Supervisor:
Prof. hon. Roger D. Hersch, BC 110, [email protected], cell: 077 406 27 09
Description:
Face detection is identifying human faces in natural images. Convolutional neural networks and deep neural networks have proved their effectiveness on detecting faces. However, performance of these approaches drop significantly on artistic images such as drawings, paintings and illustrations due to the limited training data in these domains.
In this project, we will perform face detection on comics characters. These faces differ from natural human faces due to the artistic interpretation of the authors and the fantastic nature of the characters. Therefore, we will use transfer learning and domain adaptation techniques to extract and translate facial information between different domains.
Tasks:
– Understand the literature and state-of-art
– Test several face detection algorithms on comics
– Develop a method to detect faces of different characters’ faces from multiple artistic styles
– Compare the performances of existing state-of-art face detection algorithms and our method
Prerequisites:
Experience in machine learning and computer vision, experience in Python, experience in deep learning frameworks
Deliverables:
At the end of the semester, the student should provide a framework that provides the face detection and a report of the work.
Level:
MS semester or thesis project
Type of work:
65% research, 35% development and testing
References:
[1] X. Qin, Y. Zhou, Z. He, Y. Wang and Z. Tang, “A Faster R-CNN Based Method for Comic Characters Face Detection,” 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, 2017, pp. 1074-1080, doi: 10.1109/ICDAR.2017.178.
[2] N. Inoue, R. Furuta, T. Yamasaki, K. Aizawa, Cross-domain weakly-supervised object detection through progressive domain adaptation, arXiv:1803.11365 (2018).
[3] W. Sun, J. Burie, J. Ogier and K. Kise, “Specific Comic Character Detection Using Local Feature Matching,” 2013 12th International Conference on Document Analysis and Recognition, Washington, DC, 2013, pp. 275-279, doi: 10.1109/ICDAR.2013.62.
Description:
Visual saliency refers a part in a scene that captures our attention. Current approaches for saliency estimation use eye tracking data on natural images for constructing ground truth. However, in our project we will perform eye tracking on comics pages instead of natural images. Later, we will use the collected data to estimate saliency in comics domain. In this project, you will work on an eye tracking experiment with mobile eye tracking glasses.
Tasks:
– Understand the key points of an eye tracking experiment and our setup.
– Conduct an eye tracking experiment according to given instructions.
Deliverables: At the end of the semester, the student should provide the collected data and a report of the work.
Type of work: 20% research, 80% development and testing
References:
[1] A. Borji and L. Itti, “Cat2000: A large scale fixation dataset for boosting saliency research,” CVPR 2015 workshop on ”Future of Datasets”, 2015.
[2] Kai Kunze , Yuzuko Utsumi , Yuki Shiga , Koichi Kise , Andreas Bulling, I know what you are reading: recognition of document types using mobile eye tracking, Proceedings of the 2013 International Symposium on Wearable Computers, September 08-12, 2013, Zurich, Switzerland.
[3] K. Khetarpal and E. Jain, “A preliminary benchmark of four saliency algorithms on comic art,” 2016 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), Seattle, WA.
Level: BS semester project
Supervisor: Bahar Aydemir ([email protected])
Description
Recent studies demonstrated that current deep neural networks (DNNs) are vulnerable to crafted perturbations that can cause misclassification. Most recently, several adversarial attack methods have been proposed considering the semantic properties of images [1]. They crafted content-based perturbations by mimicking the effect of traditional image processing (i.e., unconstrained perturbations). For example, as shown in Fig. 1 in [1], they took into account both colorization and adversarial perturbations to reduce the detectability and noticeability.
Inspired by this idea, in this project, the goal of this project is to develop a semantic adversarial attack by incorporating one of conventional image processing methods, which is image compression, into the adversarial attack. The distorted images have the adversarial attack ability on the deep neural network by themselves [2]. Based on the characteristics, by combining the image compression with the adversarial attack function, we can achieve the development of unnoticeable adversarial attack method. In this project, you will compare this work with conventional adversarial attacks.
Tasks
- Understand the literature and our framework.
- Implement an existing semantic adversarial attack.
- Develop a method to generate natural looking adversarial examples based on image compression methods.
- Measure the visual quality and attack success rate of the generated adversarial examples
Deliverables
- Project report
- Reproducible code
Prerequisites
- Experience and knowledge of deep learning and image processing
- Experience with TensorFlow and PyTorch for deep learning
Type of work
50% research, 50% development and testing
References
[1] A. S. Shamsabadi et al., “ColorFool: Semantic Adversarial Colorization,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[2] S. Dodge and L. Karam, “Understanding how image quality affects deep neural networks,” in IEEE International Conference on Quality of Multimedia Experience (QoMEX), 2016.
Level
MS Semester Project (Fall 2021)
Supervisor(s)
Hakgu Kim ([email protected])
Description (Master Semester Project open to EPFL students)
In this project, you will research the existing literature on weakly-supervised or unsupervised monocular depth estimation and build a model for estimating the depth maps for images in a new unseen domain. Traditionally, there have been a myriad of single-view depth estimation techniques that have been applied to real world images and are found to work considerably well. However, it becomes challenging to achieve the same results when applied to other image domains such as comics, cartoons etc. A possible solution is to develop a weakly supervised technique for depth estimation in the unseen domain via domain adaptation. A better solution would be to perform zero-shot learning to predict the relative depths.
You may contact the supervisor at any time should you want to discuss the idea further.
References
[1] “Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer”; René Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, Vladlen Koltun.
[2] “Digging Into Self-Supervised Monocular Depth Estimation” Clément Godard, Oisin Mac Aodha, Michael Firman, Gabriel Brostow
Type of Work (e.g., theory, programming)
50% research, 50% development and testing
Prerequisites
Experience in deep learning and computer vision, experience in Python, Pytorch. Experience in statistical analysis.
Models will run on Kubernetes. (We will guide you how to use Kubernetes- no prior knowledge required).
Supervisor(s)
Deblina BHATTACHARJEE ([email protected])
Description (Master Semester Project open to EPFL students)
In this project, you will research the existing literature on monocular depth estimation where Surface Normals or Fourier transforms can be used to predict the depths of single images. You will then build a model for estimating the depth maps for such images. Traditionally, there have been a lot of single-view depth estimation techniques that have used phase information from Fourier transforms of an image/ surface normals of such images and other geometrical cues. Exploring these concepts to ultimately, predict image depths would be your goal.
You may contact the supervisor at any time should you want to discuss the idea further.
Reference
[1] Single-Image Depth Estimation Based on Fourier Domain Analysis; Jae-Han Lee, Minhyeok Heo, Kyung-Rae Kim, and Chang-Su Kim, CVPR 2018.
Type of Work (e.g., theory, programming)
50% research, 50% development and testing
Prerequisites
Experience in deep learning, experience in Python, Pytorch. Experience in statistical analysis.
Some experience in signal processing.
Models will run on Kubernetes. (We will guide you how to use Kubernetes- no prior knowledge required).
Supervisor(s)
Deblina BHATTACHARJEE ([email protected])
Description:
Deep neural networks are current state-of-the-art models for many applications, but they are shown vulnerable against adversarial attacks. Very tiny perturbations of the input data can fool the undefended models so that they give wrong predictions. For evaluating the robustness of different models, threat models feature the types of perturbations applied to the input data, such as L-infinity bounded attacks, L-2 bounded attacks and sparse attacks. Unfortunately, it is shown that models robust against one particular threat model are not robust against a different threat model in general. In this project, we will explore methods to train deep neural networks robust against multiple threat models.
Tasks:
- Get familiar with existing literatures.
- Train robust neural networks against a particular threat model.
- Make robust neural networks robust against multiple threat model.
Deliverables: methods to train robust neural network models against multiple threat models.
Type of work: 20% literature review, 50% research, 30% implementation.
Prerequisites: Mathematic Foundations (Calculus, Linear algebra, Analytic Geometry), Programming (Python, PyTorch)
References:
- Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2017). Towards deep learning models resistant to adversarial attacks. ICLR 2018.
- Tramer, Florian, and Dan Boneh. “Adversarial training and robustness for multiple perturbations.” NeurIPS 2019.
Level: BS/MS Semester Project.
Supervisor: Chen Liu ([email protected])
Description:
Deep neural networks are the state-of-the-art models for many applications, but they are shown vulnerable against adversarial attacks. Imperceptible perturbations of the input can lead to the wrong predictions with very high confidence. Neural Ordinary Differential Equations (Neural ODE) are a new category of neural networks whose dynamics are features by a differential equation. Thus, Neural ODE can be considered as a new type of neural networks whose depth is continuous. In addition, Neural ODE is also a specific type of deep equilibrium models, consisting of multiple deep implicit layers.
The robustness of traditional neural networks, such as ResNet, are intensively studied recently. However, the robustness of Neural ODE or even deep equilibrium models is under exploration. This project will investigate the robustness of Neural ODE. We will start with training Neural ODE in the vanilla way and design algorithms reliably evaluating its robustness. Then, we will train Neural ODE in the adversarial way to boost its robustness.
Tasks:
- Get familiar with existing literatures, including adversarial robustness and Neural ODE.
- Train Neural ODE in the vanilla way and evaluate its robustness.
- Apply adversarial training in the context of Neural ODE.
Deliverables: Attack and defense algorithms on Neural ODE
Type of work: 15% literature review, 60% research, 25% implementation.
Prerequisites: Mathematic Foundations (Calculus, Linear algebra), Numerical Analysis, Programming (Python, PyTorch)
References:
- Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2017). Towards deep learning models resistant to adversarial attacks. ICLR 2018.
- T Chen, R. T., Rubanova, Y., Bettencourt, J., & Duvenaud, D. (2018). Neural ordinary differential equations. NeurIPS 2018.
- Yan, H., Du, J., Tan, V. Y., & Feng, J. (2019). On robustness of neural ordinary differential equations. ICLR 2019.
- Huang, Y., Yu, Y., Zhang, H., Ma, Y., & Yao, Y. (2020). Adversarial Robustness of Stabilized NeuralODEs Might be from Obfuscated Gradients. arXiv preprint arXiv:2009.13145.
Level: MS Semester Project.
Supervisor: Chen Liu ([email protected])
Description:
Generative Adversarial Networks (GAN) is the most popular framework in generative models, however, it is notorious for its training difficulty and high demand for training data. Training generative adversarial networks (GAN) with limited data will lead to discriminator overfitting, hence the generator can not output realistic images.
In this project, we will apply GAN on comics data, which is very tiny compared to other datasets, and other tiny image datasets to generate meaningful images. We will use the most updated and state-of-the-art methods to generate visually plausible images, and also try to discover the semantic information in the latent space.
Task:
- Literature review and learn to train GANs with the most advanced and updated methods.
- Test image generation on comics data
- Adding the data-augmentation strategy to enlarge the image set
- Discover the underlying semantic latent information
Prerequisites:
Having basic knowledge on python, deep learning framework (tensorflow or pytorch), and linear algebra is preferred.
Level:
BS project
Type of work:
20% literature review, 20% research, 60% development and test.
Reference:
[1] Karras T, Aittala M, Hellsten J, Laine S, Lehtinen J, Aila T. Training generative adversarial networks with limited data. arXiv preprint arXiv:2006.06676. 2020 Jun 11.
[2]. Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A. and Bengio, Y., 2014. Generative adversarial networks. arXiv preprint arXiv:1406.2661.
[3]. Karras T, Laine S, Aittala M, Hellsten J, Lehtinen J, Aila T. Analyzing and improving the image quality of stylegan. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2020 (pp. 8110-8119).
Supervisor: Tong Zhang([email protected])
Description:
Generative Adversarial Networks (GAN) have their interpretation of generating images. It has been attracted increasing attention to discover the semantic information of GANs. With the rapid progress of GAN, such as style-GAN and its enhanced version, researchers in this area have found new ways of manipulating the semantic meaning in every single layer, which is very different from applying interpolation only on latent space.
In this project, we will learn both ways of image manipulation, and analyze the difference between them. Style-GANs based methods need a vector encoded from another image and apply it explicitly in a certain layer, however, it does not have a certain rule to control the semantic meaning. On the other hand, GANs methods can generate images purely rely on latent space, whereas they can not generate images with a fixed image style. Therefore, in our project, we are targeting combining the advantage of both methods. Note that, the goal of our project may change slightly since there will be more papers coming out in the next few months, and we may modify our goal based on the recent findings..
Task:
- Literature review and learn to train GANs, style-GANs-2.
- Test image generation on standard datasets
- Discover the underlying semantic latent information
- Propose methods to infer the semantic meaning on latent space of Style-GAN without encoding an
Prerequisites:
Having knowledge on python, deep learning framework (tensorflow or pytorch), and linear algebra is required.
Level:
MS project or thesis
Type of work:
20% literature review, 40% research, 40% development and test.
Reference:
[1] Karras T, Laine S, Aittala M, Hellsten J, Lehtinen J, Aila T. Analyzing and improving the image quality of stylegan. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2020 (pp. 8110-8119).
[2]. Abdal R, Qin Y, Wonka P. Image2stylegan: How to embed images into the stylegan latent space?. InProceedings of the IEEE/CVF International Conference on Computer Vision 2019 (pp. 4432-4441).
[3]. Shen, Yujun, and Bolei Zhou. “Closed-form factorization of latent semantics in gans.” arXiv preprint arXiv:2007.06600(2020).
[4]. Karras, Tero, Samuli Laine, and Timo Aila. “A style-based generator architecture for generative adversarial networks.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019
Supervisor: Tong Zhang([email protected])
Description:
Available Instance Segmentation methods require costly annotations and perform poorly in different domains. Our aim is to remove/ease the annotation cost and build a model that can generalize to various domains. We will use self- and weakly-supervised learning methods to achieve better generalization with less/no annotation. We can also use domain adaptation and/or image-to-image translation methods.
References:
[1] Zhao, Nanxuan, et al. “What makes instance discrimination good for transfer learning?.” arXiv preprint arXiv:2006.06606 (2020).
[2] B. Zhou, A. Khosla, L. A., A. Oliva, and A. Torralba, “Learning Deep Features for Discriminative Localization.” CVPR, 2016.
[3] He, Kaiming, et al. “Momentum contrast for unsupervised visual representation learning.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.
Deliverables: Report and reproducable implementations
Prerequisites: Experience with deep learning, Pytorch, computer vision
Level: MS semester project
Type of work: 60% research, 40% implementation
Supervisors: Baran Ozaydin ([email protected])
Description:
Pixel-wise annotations are crucial for the segmentation task but they require extensive human effort and time. Our aim is to incorporate deep features in the annotation process and ease the annotation cost/time. We are planning to combine classification features with superpixels and/or foreground extraction algorithms to bring efficiency to the annotation process.
References:
[1] Achanta, Radhakrishna, et al. “SLIC superpixels compared to state-of-the-art superpixel methods.” IEEE transactions on pattern analysis and machine intelligence 34.11 (2012): 2274-2282.
[2] Rother, Carsten, Vladimir Kolmogorov, and Andrew Blake. “” GrabCut” interactive foreground extraction using iterated graph cuts.” ACM transactions on graphics (TOG) 23.3 (2004): 309-314.
[3] B. Zhou, A. Khosla, L. A., A. Oliva, and A. Torralba, “Learning Deep Features for Discriminative Localization.” CVPR, 2016.
Deliverables:
Report, annotations and running prototype (Python or Matlab).
Prerequisites:
– Knowledge of image processing
– Basic coding skills in Python and basic deep learning knowledge
Level:
BS semester project
Supervisor: Baran Ozaydin ([email protected])
Description
Recent studies demonstrated that current deep neural networks (DNNs) are vulnerable to artificially generated adversarial examples that can cause misclassification. Adversarial training is one of the most famous defense methods in order to make the DNNs robust against the adversarial examples. The adversarial training is to train the DNNs with a large number of adversarial examples with imperceptible perturbations for the adversarial robustness.
In this project, the goal of this project is to develop a novel adversarial training by incorporating one of the state-of-the-arts data augmentation methods such as mixup [1] and cutmix [2]. By mixing the different adversarial attack methods in each adversarial example, we can get much stronger adversarial attack power. With the augmented strong adversarial examples, we can make the DNNs much robust against multiple adversarial attacks.
Tasks
Understand the literature and our framework.
Implement conventional adversarial attack methods and augment them.
Train the deep neural network with the augmented adversarial examples for the robustness
Deliverables
Project report
Reproducible code
Prerequisites
Experience and knowledge of deep learning and image processing
Experience with TensorFlow and PyTorch for deep learning
Type of work
50% research, 50% development and testing
References
[1] H. Zhang et al., “MixUp: Beyond Empirical Risk Minimization,” ICLR, 2018.
[2] S. Yun et al., “CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features,” ICCV, 2019.
Level
BS/MS Semester Project (Fall 2021)
Supervisor(s)
Hakgu Kim ([email protected])
Description
This project is on extreme video completion, that is very efficient both in computation and energy consumption at the transmitter as well as compression power. It has various applications in IoT devices, surveillance, drones, emergency-deployed networks.
You would be working on an optimized algorithm in Python, or if possible in C++, following the reference below. Other steps include a deep learning method for processing the results (deep restoration neural networks), and designing additional features for the encoder/compression part of the method. This will be structured based on the skills and objectives of the student.
Deliverables
- Code, well cleaned up and easily reproducible.
Prerequisites
- Depends on the final structure but could be useful: General experience with image processing, video processing being a plus. PyTorch for deep learning. C++ experience.
Type of work
Mainly implementation.
References
Level
MS, potentially BS
Supervisor
Available projects – Spring 2022
The biggest challenge in using neural networks to manipulate images for professional cinema production is temporal stability – as no flickering or glitches can be tolerated.
Swiss production company and post-production house 8horses has developed an experimental proof-of-concept model to translate the color characteristics of photochemical 35mm film to footage acquired with digital cinema cameras. In the PoC model good temporal stability has been achieved by combining a cycleGAN architecture and unpaired training with a temporal loss function comparing only one previous frame.
The proof-of-concept has been trained on a dataset of aligned images of 35mm film and footage from an Arri Alexa digital camera from a research project by ZHdK.
The goal is to further refine the existing model, understand the mechanism leading to temporal stability and possibly improve it. The feature film project ELECTRIC CHILD by 8horses can be used as a case-study for the project. During the shooting of the film still frames can be shot on 35mm film to create a new dataset to specifically fine-tune the model for the scenes of the film
Tasks
- Further refine the proof-of concept.
- Research and improve the mechanism leading to temporal stability
- Create a fine-tuning dataset with images from the case-study feature film and test it on the film in a professional post-production environment.
Deliverables
- Code, well cleaned up and easily reproducible.
Prerequisites
- Experience with Python and PyTorch for deep learning
- Experience and knowledge of deep learning and computer vision
Type of work:
20% research, 80% development and test.
Supervisors:
Berk Dogan ([email protected]) and Simon Jaquemet ([email protected])
References:
Proof-of-concept demo:
https://cloud.8horses.ch/s/RHtRdyZqqZY7ga2
https://www.zhdk.ch/forschungsprojekt/analog–digital-426752
https://github.com/NVIDIA/pix2pixHD