In the framework of the AI Data Platform project, this semester project aims at building a text embeddings similarity search engine for enhancing image similarity search. The goal is to search images from text queries. Images being indexed with meta-tags or metadata such as tweets, we want to rank images based on the similarity between a given text query and the images metatags and/or metadata. BERT, introduced by Google in 2018, provides embeddings for words as well as sentences. In this project, the student would develop a semantics-oriented search engine using BERT embeddings that can encode the text query and rank the images’ meta-tags/metadata in the order of the most meaningful to least meaningful.
Deliverables: codebase with documentation
PREREQUISITES
- Familiar with Python
- Creativity, spirit, initiative and pro-active
- Knowledge of Linux and related tools
PREFERRED, BUT NOT REQUIRED
- Experience in Machine Learning
- Experience in Natural Language Processing
Send me your CV: [email protected].