TY - JOUR
T1 - Testing of Clustering
AU - Alon, Noga
AU - Dar, Seannie
AU - Parnas, Michal
AU - Ron, Dana
PY - 2003
Y1 - 2003
N2 - A set X of points in Re^d is (k,b)-clusterable if X can be partitioned into k subsets (clusters) so that the diameter (alternatively, the radius) of each cluster is at most b. We present algorithms that, by sampling from a set X, distinguish between the case that X is (k,b)-clusterable and the case that X is epsilon-far from being (k,b')-clusterable for any given 0 < epsilon and for b' geq b. By epsilon-far from being (k,b')-clusterable we mean that more than epsilonX| points should be removed from X so that it becomes (k,b')-clusterable. We give algorithms for a variety of cost measures that use a sample of size independent of |X| and polynomial in k and 1/epsilon.Our algorithms can also be used to find approximately good clusterings. Namely, these are clusterings of all but an epsilon-fraction of the points in X that have optimal (or close to optimal) cost. The benefit of our algorithms is that they construct an implicit representation of such clusterings in time independent of |X|. That is, without actually having to partition all points in X, the implicit representation can be used to answer queries concerning the cluster to which any given point belongs.
AB - A set X of points in Re^d is (k,b)-clusterable if X can be partitioned into k subsets (clusters) so that the diameter (alternatively, the radius) of each cluster is at most b. We present algorithms that, by sampling from a set X, distinguish between the case that X is (k,b)-clusterable and the case that X is epsilon-far from being (k,b')-clusterable for any given 0 < epsilon and for b' geq b. By epsilon-far from being (k,b')-clusterable we mean that more than epsilonX| points should be removed from X so that it becomes (k,b')-clusterable. We give algorithms for a variety of cost measures that use a sample of size independent of |X| and polynomial in k and 1/epsilon.Our algorithms can also be used to find approximately good clusterings. Namely, these are clusterings of all but an epsilon-fraction of the points in X that have optimal (or close to optimal) cost. The benefit of our algorithms is that they construct an implicit representation of such clusterings in time independent of |X|. That is, without actually having to partition all points in X, the implicit representation can be used to answer queries concerning the cluster to which any given point belongs.
KW - Approximation algorithms
KW - Clustering
KW - Property testing
KW - Randomized algorithms
UR - http://www.scopus.com/inward/record.url?scp=0041384098&partnerID=8YFLogxK
U2 - 10.1137/S0895480102410973
DO - 10.1137/S0895480102410973
M3 - Article
SN - 0895-4801
VL - 16
SP - 393
EP - 417
JO - SIAM Journal on Discrete Mathematics
JF - SIAM Journal on Discrete Mathematics
IS - 3
ER -