Abstract Gaussian Mixture Models (GMMs) have been widely used to cluster data in an unsupervised manner via the Expectation Maximization (EM) algorithm. In this chapter we suggest a semi-supervised EM algorithm that incorporates equivalence constraints into a GMM. Equivalence constraints provide information about pairs of data points, indicating whether the points arise from the same source (a must-link constraint) or from diﬀerent sources (a cannot-link constraint). These constraints allow the EM algorithm to converge to solutions that better reﬂect the class structure of the data. Moreover, in some learning scenarios equivalence constraints can be gathered automatically while they are a natural form of supervision in others. We present a closed form EM algorithm for handling must-link constraints, and a generalized EM algorithm using a Markov network for incorporating cannotlink constraints. Using publicly available data sets, we demonstrate that incorporating equivalence constraints leads to a considerable improvement in clustering performance. Our GMM-based clustering algorithm signiﬁcantly outperforms two other available clustering methods that use equivalence con-Mixture models are a powerful tool for probabilistic modelling of data, which have been widely used in various research areas such as pattern recognition, machine learning, computer vision, and signal processing [13, 14, 18]. Such models provide a principled probabilistic approach to cluster data in an unsupervised manner [24, 25, 30, 31]. In addition, their ability to represent complex density functions has also made them an excellent choice in density estimation problems [20, 23].
|Title of host publication||Constrained Clustering|
|Subtitle of host publication||Advances in Algorithms, Theory, and Applications|
|Number of pages||26|
|State||Published - 1 Jan 2008|