TILDE: A Temporally Invariant Learned DEtector ‒ CVLAB ‐ EPFL

Abstract

We introduce a learning-based approach to detect repeatable keypoints under drastic imaging changes of weather and lighting conditions to which state-of-the-art keypoint detectors are surprisingly sensitive. We first identify good keypoint candidates in multiple training images taken from the same viewpoint. We then train a regressor to predict a score map whose maxima are those points so that they can be found by simple non-maximum suppression. As there are no standard datasets to test the influence of these kinds of changes, we created our own, which we will make publicly available. We will show that our method significantly outperforms the state-of-the-art methods in such challenging conditions, while still achieving state-of-the-art performance on the untrained standard Oxford dataset.

References

Warning

Please note that the publication lists from Infoscience integrated into the EPFL website, lab or people pages are frozen following the launch of the new version of platform. The owners of these pages are invited to recreate their publication list from Infoscience. For any assistance, please consult the Infoscience help or contact support.

TILDE: A Temporally Invariant Learned DEtector

Y. Verdie; K. M. Yi; P. Fua; V. Lepetit

2015. Computer Vision and Pattern Recognition (CVPR), Boston, Massachusetts, USA, p. 5279-5288. DOI : 10.1109/CVPR.2015.7299165.

Detailed record

Full text – View at publisher

Teaser Video

This video teaser shows additional results of our methods on sequences which were not shown on the paper such as Mexico and Chamonix sequences. A cross is displayed when not enough matches are provided to compute an homography. To avoid the effect of the descriptors, we use a nearest neighbor match for creating the pairs.

Warning

Embed of video is only possible from Mediaspace, Vimeo or Youtube

This is another video showing the actual TILDE keypoints detected on the Chamonix sequence. Note that our points are quite reliable compared to SIFT. Best 100 keypoints for both methods are shown.

Warning

Embed of video is only possible from Mediaspace, Vimeo or Youtube

Example application

This video illustrates a potential applications directly using our method. A video is created from a cellphone and matched to a panoramic image generated beforehand. As there is a large temporal gap between the two sequences, it is difficult for traditional methods to find correspondance between the two. We use the same setting for both methods and ‘Opponent SIFT’ for descriptor.

Warning

Embed of video is only possible from Mediaspace, Vimeo or Youtube

Supplementary material

Click the following link for the supplementary appendix for implementation details and mathematical derivations.

paper_1354_supplementary.pdf

Dataset and Codes

Datasets used in the paper.

Codes for TILDE and the evaluation framework

Github project page: https://github.com/cvlab-epfl/TILDE

Contacts

Yannick Verdie	[e-mail]
Kwang Moo Yi	[e-mail]
Pascal Fua	[e-mail]
Vincent Lepetit	[e-mail]