Clustering lines in high-dimensional space: Classification of incomplete data

Jie Gao, Michael Langberg, Leonard J. Schulman

نتاج البحث: نشر في مجلةمقالةمراجعة النظراء


A set of κ balls B1,. . ., Bκ in a Euclidean space is said to cover a collection of lines if every line intersects some ball.We consider the κ-center problem for lines in high-dimensional space: Given a set of n lines l = {l1,. . ., ln} in ℝd, find κ balls of minimum radius which cover l. We present a 2-approximation algorithm for the cases κ = 2, 3 of this problem, having running time quasi-linear in the number of lines and the dimension of the ambient space. Our result for 3-clustering is strongly based on a new result in discrete geometry that may be of independent interest: a Helly-type theorem for collections of axis-parallel "crosses" in the plane. The family of crosses does not have finite Helly number in the usual sense. Our Helly theorem is of a new type: it depends on ε-contracting the sets. In statistical practice, data is often incompletely specified; we consider lines as the most elementary case of incompletely specified data points. Clustering of data is a key primitive in nonparametric statistics. Our results provide a way of performing this primitive on incomplete data, as well as imputing the missing values.

اللغة الأصليةالإنجليزيّة
رقم المقال8
دوريةACM Transactions on Algorithms
مستوى الصوت7
رقم الإصدار1
المعرِّفات الرقمية للأشياء
حالة النشرنُشِر - نوفمبر 2010


أدرس بدقة موضوعات البحث “Clustering lines in high-dimensional space: Classification of incomplete data'. فهما يشكلان معًا بصمة فريدة.

قم بذكر هذا