TY - JOUR
T1 - When standard RANSAC is not enough
T2 - Cross-media visual matching with hypothesis relevancy
AU - Hassner, Tal
AU - Assif, Liav
AU - Wolf, Lior
N1 - Funding Information:
TH was partially funded by General Motors (GM).
PY - 2014/5
Y1 - 2014/5
N2 - The same scene can be depicted by multiple visual media. For example, the same event can be captured by a comic image or a movie frame; the same object can be represented by a photograph or by a 3D computer graphics model. In order to extract the visual analogies that are at the heart of cross-media analysis, spatial matching is required. This matching is commonly achieved by extracting key points and scoring multiple, randomly generated mapping hypotheses. The more consensus a hypothesis can draw, the higher its score. In this paper, we go beyond the conventional set-size measure for the quality of a match and present a more general hypothesis score that attempts to reflect how likely is each hypothesized transformation to be the correct one for the matching task at hand. This is achieved by considering additional, contextual cues for the relevance of a hypothesized transformation. This context changes from one matching task to another and reflects different properties of the match, beyond the size of a consensus set. We demonstrate that by learning how to correctly score each hypothesis based on these features we are able to deal much more robustly with the challenges required to allow cross-media analysis, leading to correct matches where conventional methods fail.
AB - The same scene can be depicted by multiple visual media. For example, the same event can be captured by a comic image or a movie frame; the same object can be represented by a photograph or by a 3D computer graphics model. In order to extract the visual analogies that are at the heart of cross-media analysis, spatial matching is required. This matching is commonly achieved by extracting key points and scoring multiple, randomly generated mapping hypotheses. The more consensus a hypothesis can draw, the higher its score. In this paper, we go beyond the conventional set-size measure for the quality of a match and present a more general hypothesis score that attempts to reflect how likely is each hypothesized transformation to be the correct one for the matching task at hand. This is achieved by considering additional, contextual cues for the relevance of a hypothesized transformation. This context changes from one matching task to another and reflects different properties of the match, beyond the size of a consensus set. We demonstrate that by learning how to correctly score each hypothesis based on these features we are able to deal much more robustly with the challenges required to allow cross-media analysis, leading to correct matches where conventional methods fail.
KW - 3D viewpoint estimation
KW - Image registration
KW - Object detection
KW - Shape matching
UR - http://www.scopus.com/inward/record.url?scp=84899955793&partnerID=8YFLogxK
U2 - 10.1007/s00138-013-0571-4
DO - 10.1007/s00138-013-0571-4
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:84899955793
SN - 0932-8092
VL - 25
SP - 971
EP - 983
JO - Machine Vision and Applications
JF - Machine Vision and Applications
IS - 4
ER -