Clustering lines in high-dimensional space: Classification of incomplete data

Jie Gao, Michael Langberg, Leonard J. Schulman

פרסום מחקרי: פרסום בכתב עתמאמרביקורת עמיתים


A set of κ balls B1,. . ., Bκ in a Euclidean space is said to cover a collection of lines if every line intersects some ball.We consider the κ-center problem for lines in high-dimensional space: Given a set of n lines l = {l1,. . ., ln} in ℝd, find κ balls of minimum radius which cover l. We present a 2-approximation algorithm for the cases κ = 2, 3 of this problem, having running time quasi-linear in the number of lines and the dimension of the ambient space. Our result for 3-clustering is strongly based on a new result in discrete geometry that may be of independent interest: a Helly-type theorem for collections of axis-parallel "crosses" in the plane. The family of crosses does not have finite Helly number in the usual sense. Our Helly theorem is of a new type: it depends on ε-contracting the sets. In statistical practice, data is often incompletely specified; we consider lines as the most elementary case of incompletely specified data points. Clustering of data is a key primitive in nonparametric statistics. Our results provide a way of performing this primitive on incomplete data, as well as imputing the missing values.

שפה מקוריתאנגלית
מספר המאמר8
כתב עתACM Transactions on Algorithms
מספר גיליון1
מזהי עצם דיגיטלי (DOIs)
סטטוס פרסוםפורסם - נוב׳ 2010

טביעת אצבע

להלן מוצגים תחומי המחקר של הפרסום 'Clustering lines in high-dimensional space: Classification of incomplete data'. יחד הם יוצרים טביעת אצבע ייחודית.

פורמט ציטוט ביבליוגרפי