Testing of Clustering

Noga Alon, Seannie Dar, Michal Parnas, Dana Ron

نتاج البحث: نشر في مجلةمقالةمراجعة النظراء


A set X of points in Re^d is (k,b)-clusterable if X can be partitioned into k subsets (clusters) so that the diameter (alternatively, the radius) of each cluster is at most b. We present algorithms that, by sampling from a set X, distinguish between the case that X is (k,b)-clusterable and the case that X is epsilon-far from being (k,b')-clusterable for any given 0 < epsilon and for b' geq b. By epsilon-far from being (k,b')-clusterable we mean that more than epsilonX| points should be removed from X so that it becomes (k,b')-clusterable. We give algorithms for a variety of cost measures that use a sample of size independent of |X| and polynomial in k and 1/epsilon.Our algorithms can also be used to find approximately good clusterings. Namely, these are clusterings of all but an epsilon-fraction of the points in X that have optimal (or close to optimal) cost. The benefit of our algorithms is that they construct an implicit representation of such clusterings in time independent of |X|. That is, without actually having to partition all points in X, the implicit representation can be used to answer queries concerning the cluster to which any given point belongs.
اللغة الأصليةإنجليزيّة أمريكيّة
الصفحات (من إلى)393-417
عدد الصفحات25
دوريةSIAM Journal on Discrete Mathematics
مستوى الصوت16
رقم الإصدار3
المعرِّفات الرقمية للأشياء
حالة النشرنُشِر - 2003


أدرس بدقة موضوعات البحث “Testing of Clustering'. فهما يشكلان معًا بصمة فريدة.

قم بذكر هذا