TY - GEN

T1 - Learning Distance Functions using Equivalence Relations

AU - Bar-Hillel, Aharon

AU - Hertz, Tomer

AU - Shental, Noam

AU - Weinshall, Daphna

N1 - Copyright:
Copyright 2008 Elsevier B.V., All rights reserved.

PY - 2003

Y1 - 2003

N2 - We address the problem of learning distance metrics using side-information in the form of groups of "similar" points. We propose to use the RCA algorithm, which is a simple and efficient algorithm for learning a full ranked Mahalanobis metric (Shental et al., 2002). We first show that RCA obtains the solution to an interesting optimization problem, founded on an information theoretic basis. If the Mahalanobis matrix is allowed to be singular, we show that Fisher's linear discriminant followed by RCA is the optimal dimensionality reduction algorithm under the same criterion. We then show how this optimization problem is related to the criterion optimized by another recent algorithm for metric learning (Xing et al., 2002), which uses the same kind of side information. We empirically demonstrate that learning a distance metric using the RCA algorithm significantly improves clustering performance, similarly to the alternative algorithm. Since the RCA algorithm is much more efficient and cost effective than the alternative, as it only uses closed form expressions of the data, it seems like a preferable choice for the learning of full rank Mahalanobis distances.

AB - We address the problem of learning distance metrics using side-information in the form of groups of "similar" points. We propose to use the RCA algorithm, which is a simple and efficient algorithm for learning a full ranked Mahalanobis metric (Shental et al., 2002). We first show that RCA obtains the solution to an interesting optimization problem, founded on an information theoretic basis. If the Mahalanobis matrix is allowed to be singular, we show that Fisher's linear discriminant followed by RCA is the optimal dimensionality reduction algorithm under the same criterion. We then show how this optimization problem is related to the criterion optimized by another recent algorithm for metric learning (Xing et al., 2002), which uses the same kind of side information. We empirically demonstrate that learning a distance metric using the RCA algorithm significantly improves clustering performance, similarly to the alternative algorithm. Since the RCA algorithm is much more efficient and cost effective than the alternative, as it only uses closed form expressions of the data, it seems like a preferable choice for the learning of full rank Mahalanobis distances.

KW - Clustering

KW - Feature selection

KW - Learning from partial knowledge

KW - Semi-supervised learning

UR - http://www.scopus.com/inward/record.url?scp=1942517347&partnerID=8YFLogxK

M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???

AN - SCOPUS:1942517347

SN - 1577351894

VL - 1

T3 - Proceedings, Twentieth International Conference on Machine Learning

SP - 11

EP - 18

BT - Proceedings, Twentieth International Conference on Machine Learning

A2 - Fawcett, T.

A2 - Mishra, N.

T2 - Proceedings, Twentieth International Conference on Machine Learning

Y2 - 21 August 2003 through 24 August 2003

ER -