TY - JOUR

T1 - Clustering lines in high-dimensional space

T2 - Classification of incomplete data

AU - Gao, Jie

AU - Langberg, Michael

AU - Schulman, Leonard J.

N1 - Copyright:
Copyright 2011 Elsevier B.V., All rights reserved.

PY - 2010/11

Y1 - 2010/11

N2 - A set of κ balls B1,. . ., Bκ in a Euclidean space is said to cover a collection of lines if every line intersects some ball.We consider the κ-center problem for lines in high-dimensional space: Given a set of n lines l = {l1,. . ., ln} in ℝd, find κ balls of minimum radius which cover l. We present a 2-approximation algorithm for the cases κ = 2, 3 of this problem, having running time quasi-linear in the number of lines and the dimension of the ambient space. Our result for 3-clustering is strongly based on a new result in discrete geometry that may be of independent interest: a Helly-type theorem for collections of axis-parallel "crosses" in the plane. The family of crosses does not have finite Helly number in the usual sense. Our Helly theorem is of a new type: it depends on ε-contracting the sets. In statistical practice, data is often incompletely specified; we consider lines as the most elementary case of incompletely specified data points. Clustering of data is a key primitive in nonparametric statistics. Our results provide a way of performing this primitive on incomplete data, as well as imputing the missing values.

AB - A set of κ balls B1,. . ., Bκ in a Euclidean space is said to cover a collection of lines if every line intersects some ball.We consider the κ-center problem for lines in high-dimensional space: Given a set of n lines l = {l1,. . ., ln} in ℝd, find κ balls of minimum radius which cover l. We present a 2-approximation algorithm for the cases κ = 2, 3 of this problem, having running time quasi-linear in the number of lines and the dimension of the ambient space. Our result for 3-clustering is strongly based on a new result in discrete geometry that may be of independent interest: a Helly-type theorem for collections of axis-parallel "crosses" in the plane. The family of crosses does not have finite Helly number in the usual sense. Our Helly theorem is of a new type: it depends on ε-contracting the sets. In statistical practice, data is often incompletely specified; we consider lines as the most elementary case of incompletely specified data points. Clustering of data is a key primitive in nonparametric statistics. Our results provide a way of performing this primitive on incomplete data, as well as imputing the missing values.

KW - Clustering

KW - Helly theorem

KW - High dimension

KW - Lines

KW - κ-center

UR - http://www.scopus.com/inward/record.url?scp=78650647964&partnerID=8YFLogxK

U2 - 10.1145/1868237.1868246

DO - 10.1145/1868237.1868246

M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???

AN - SCOPUS:78650647964

SN - 1549-6325

VL - 7

JO - ACM Transactions on Algorithms

JF - ACM Transactions on Algorithms

IS - 1

M1 - 8

ER -