When standard RANSAC is not enough: Cross-media visual matching with hypothesis relevancy

Tal Hassner, Liav Assif, Lior Wolf

Research output: Contribution to journalArticlepeer-review

Abstract

The same scene can be depicted by multiple visual media. For example, the same event can be captured by a comic image or a movie frame; the same object can be represented by a photograph or by a 3D computer graphics model. In order to extract the visual analogies that are at the heart of cross-media analysis, spatial matching is required. This matching is commonly achieved by extracting key points and scoring multiple, randomly generated mapping hypotheses. The more consensus a hypothesis can draw, the higher its score. In this paper, we go beyond the conventional set-size measure for the quality of a match and present a more general hypothesis score that attempts to reflect how likely is each hypothesized transformation to be the correct one for the matching task at hand. This is achieved by considering additional, contextual cues for the relevance of a hypothesized transformation. This context changes from one matching task to another and reflects different properties of the match, beyond the size of a consensus set. We demonstrate that by learning how to correctly score each hypothesis based on these features we are able to deal much more robustly with the challenges required to allow cross-media analysis, leading to correct matches where conventional methods fail.

Original languageEnglish
Pages (from-to)971-983
Number of pages13
JournalMachine Vision and Applications
Volume25
Issue number4
DOIs
StatePublished - May 2014

Bibliographical note

Funding Information:
TH was partially funded by General Motors (GM).

Keywords

  • 3D viewpoint estimation
  • Image registration
  • Object detection
  • Shape matching

Fingerprint

Dive into the research topics of 'When standard RANSAC is not enough: Cross-media visual matching with hypothesis relevancy'. Together they form a unique fingerprint.

Cite this