Fall 2012 ‒ IVRL ‐ EPFL

RGB/NIR Joint Demosaicing Using Compressive Sensing
In this project, we address the problem of RGB/NIR joint acquisition with a single sensor, where a demosaicing algorithm is used to reconstruct the full resolution color and NIR images. The mosaicing and demosaicing steps of multichannel imaging is generally similar to the concept of compressed sensing where few samples of signal are measured and used in reconstructing the full resolution signal. This resemblance has already been exploited in color demosaicing [1]. The algorithm proposed in [1] results in superior visual qualities compared to the state-of-the-art color demosaicing approaches.
The aim of this project is to modify the algorithm proposed in [1] to be used for joint demosaicing of RGB/NIR images. For instance de-correlating transforms used in [1] for color images might be inappropriate for NIR images and need to be replaced with other transforms. The results of this project should help us to realize that whether compressive sensing tools are efficient for joint demosaicing of RGB and NIR images or not.

References
[1] A. A. Moghadam, M. Aghagolzadeh, M. Kumar, and H. Radha, \Compressive Demosaicing,” in IEEE International Workshop on Multimedia Signal Processing, 2010.

Type of work: 60% Theory, 40% MATLAB implementation.

Prerequisite: Knowledge about compressed sensing and color demosaicing and good MATLAB skills.

Level: MS, semester project

Supervisor: Zahra Sedeghipoor (zahra.sadeghipoor@epfl.ch)

Using Audio Cues for Image Understanding
Synopsis: Exploiting the relation between image and semantics by using recognized voice

Thanks to the improvements in mobile phone technology in the last decade, smart phones became versatile devices to capture multidimensional data with various sensors. These sensors started to change the understanding of the image such a way that photography is now considered in a multisensory framework. One of the sensors in this framework is the microphone. There is a collection of audio cues that can be of good use in image understanding such as spoken words, sound of an object, and the surrounding noise level. In this project, we design an algorithm to fuse audio and visual information to understand the image context and enhance the image accordingly.

In this project you will:

• Use the iPhone app that is designed in our lab to collect image and audio data (we will provide an iPhone to collect the data, if necessary)

• Convert audio data into text by using a speech recognizer or transcribing the text

• Find the relation between the spoken text and the image with psychophysical tests

• Enhance image according the text information (you can use the semantic image enhancement algorithm that is already available in IVRG)

Deliverables: A workflow to extract semantic information from an image using sound captured when the picture was taken (words spoken by the photographer, and environment sound). This workflow could be used to improve the image using an existing method. An ultimate deliverable would be a working algorithm and a final report.

Prerequisites: This project requires a general knowledge of machine learning techniques and MATLAB skills. Optionally, coding knowledge in a mobile platform can be very helpful in data collection.

Type of work: 60% research and 40% implementation

Level: MS, semester project

Supervisor: Gokhan Yildirim (gokhan.yildirim@epfl.ch)