Abstract
A set X of points in /spl Rfr//sup d/ is (k,b)-clusterable if X can be partitioned into k subsets (clusters) so that the diameter (alternatively, the radius) of each cluster is at most b. We present algorithms that by sampling from a set X, distinguish between the case that X is (k,b)-clusterable and the case that X is /spl epsiv/-far from being (k,b')-clusterable for any given 0>/spl epsiv//spl les/1 and for b'/spl ges/b. In /spl epsiv/-far from being (k,b')-clusterable we mean that more than /spl epsiv/.|X| points should be removed from X so that it becomes (k,b')-clusterable. We give algorithms for a variety of cost measures that use a sample of size independent of |X|, and polynomial in k and 1//spl epsiv/. Our algorithms can also be used to find approximately good clusterings. Namely, these are clusterings of all but an /spl epsiv/-fraction of the points in X that have optimal (or close to optimal) cost. The benefit of our algorithms is that they construct an implicit representation of such clusterings in time independent of |X|. That is, without actually having to partition all points in X, the implicit representation can be used to answer queries concerning the cluster any given point belongs to.
| Original language | English |
|---|---|
| Title of host publication | Proceedings 41st Annual Symposium on Foundations of Computer Science |
| Place of Publication | Los Alamitos, CA, USA |
| Publisher | IEEE Computer Society |
| Pages | 240 |
| Number of pages | 1 |
| DOIs | |
| State | Published - 1 Nov 2000 |
| Event | 41st Annual Symposium on Foundations of Computer Science - Redondo Beach, CA, United States Duration: 12 Nov 2000 → 14 Nov 2000 |
Conference
| Conference | 41st Annual Symposium on Foundations of Computer Science |
|---|---|
| Country/Territory | United States |
| City | Redondo Beach, CA |
| Period | 12/11/00 → 14/11/00 |
Keywords
- pattern clustering
- statistical analysis
- computational complexity
- clustering testing
- sampling
- cost measures
- optimal cost
- lower bounds