Analysis of incomplete data and an intrinsic-dimension Helly Theorem

Jie Gao, Michael Langberg, Leonard J. Schulman

Research output: Contribution to journalArticlepeer-review

Abstract

The analysis of incomplete data is a long-standing challenge in practical statistics. When, as is typical, data objects are represented by points in ℝd , incomplete data objects correspond to affine subspaces (lines or Δ-flats). With this motivation we study the problem of finding the minimum intersection radius r(L) of a set of lines or Δ-flats L: the least r such that there is a ball of radius r intersecting every flat in L. Known algorithms for finding the minimum enclosing ball for a point set (or clustering by several balls) do not easily extend to higher-dimensional flats, primarily because "distances" between flats do not satisfy the triangle inequality. In this paper we show how to restore geometry (i.e., a substitute for the triangle inequality) to the problem, through a new analog of Helly's theorem. This "intrinsic-dimension" Helly theorem states: for any family L of Δ-dimensional convex sets in a Hilbert space, there exist Δ+2 sets L′ ⊆ L such that r(L)≤2r(L′). Based upon this we present an algorithm that computes a (1+ε)-core set L′ ⊆ L, |L′|=O(Δ4/ε), such that the ball centered at a point c with radius (1+ε)r(L′) intersects every element of L. The running time of the algorithm is O(nΔ+1 dpoly (Δ/ε)). For the case of lines or line segments (Δ=1), the (expected) running time of the algorithm can be improved to O(ndpoly (1/ε)). We note that the size of the core set depends only on the dimension of the input objects and is independent of the input size n and the dimension d of the ambient space.

Original languageEnglish
Pages (from-to)537-560
Number of pages24
JournalDiscrete and Computational Geometry
Volume40
Issue number4
DOIs
StatePublished - Dec 2008

Bibliographical note

Funding Information:
Research of L.J. Schulman supported in part by an NSF ITR and the Okawa Foundation.

Funding Information:
Work was done when M. Langberg was a postdoctoral scholar at the California Institute of Technology. Research supported in part by NSF grant CCF-0346991.

Keywords

  • Approximation
  • Clustering
  • Core set
  • Helly theorem
  • Incomplete data
  • Inference
  • k-center

Fingerprint

Dive into the research topics of 'Analysis of incomplete data and an intrinsic-dimension Helly Theorem'. Together they form a unique fingerprint.

Cite this