Abstract
We introduce a learning-based approach to detect repeatable keypoints under drastic imaging changes of weather and lighting conditions to which state-of-the-art keypoint detectors are surprisingly sensitive. We first identify good keypoint candidates in multiple training images taken from the same viewpoint. We then train a regressor to predict a score map whose maxima are those points so that they can be found by simple non-maximum suppression. As there are no standard datasets to test the influence of these kinds of changes, we created our own, which we will make publicly available. We will show that our method significantly outperforms the state-of-the-art methods in such challenging conditions, while still achieving state-of-the-art performance on the untrained standard Oxford dataset.
References
Please note that the publication lists from Infoscience integrated into the EPFL website, lab or people pages are frozen following the launch of the new version of platform. The owners of these pages are invited to recreate their publication list from Infoscience. For any assistance, please consult the Infoscience help or contact support.
TILDE: A Temporally Invariant Learned DEtector
2015. Computer Vision and Pattern Recognition (CVPR), Boston, Massachusetts, USA, p. 5279-5288. DOI : 10.1109/CVPR.2015.7299165.Teaser Video
This video teaser shows additional results of our methods on sequences which were not shown on the paper such as Mexico and Chamonix sequences. A cross is displayed when not enough matches are provided to compute an homography. To avoid the effect of the descriptors, we use a nearest neighbor match for creating the pairs.
Embed of video is only possible from Mediaspace, Vimeo or Youtube
This is another video showing the actual TILDE keypoints detected on the Chamonix sequence. Note that our points are quite reliable compared to SIFT. Best 100 keypoints for both methods are shown.
Embed of video is only possible from Mediaspace, Vimeo or Youtube
Example application
This video illustrates a potential applications directly using our method. A video is created from a cellphone and matched to a panoramic image generated beforehand. As there is a large temporal gap between the two sequences, it is difficult for traditional methods to find correspondance between the two. We use the same setting for both methods and ‘Opponent SIFT’ for descriptor.
Embed of video is only possible from Mediaspace, Vimeo or Youtube
Supplementary material
Click the following link for the supplementary appendix for implementation details and mathematical derivations.
Dataset and Codes
Datasets used in the paper.
Codes for TILDE and the evaluation framework
Contacts
Yannick Verdie | [e-mail] |
Kwang Moo Yi | [e-mail] |
Pascal Fua | [e-mail] |
Vincent Lepetit | [e-mail] |