TY - JOUR
T1 - Do less, get more
T2 - 32nd Conference on Neural Information Processing Systems, NeurIPS 2018
AU - Feldman, Moran
AU - Karbasi, Amin
AU - Kazemi, Ehsan
N1 - Publisher Copyright:
© 2018 Curran Associates Inc..All rights reserved.
Copyright:
Copyright 2019 Elsevier B.V., All rights reserved.
PY - 2018
Y1 - 2018
N2 - In this paper, we develop the first one-pass streaming algorithm for submodular maximization that does not evaluate the entire stream even once. By carefully subsampling each element of the data stream, our algorithm enjoys the tightest approximation guarantees in various settings while having the smallest memory footprint and requiring the lowest number of function evaluations. More specifically, for a monotone submodular function and a p-matchoid constraint, our randomized algorithm achieves a 4p approximation ratio (in expectation) with O(k) memory and O(km/p) queries per element (k is the size of the largest feasible solution and m is the number of matroids used to define the constraint). For the non-monotone case, our approximation ratio increases only slightly to 4p + 2 o(1). To the best or our knowledge, our algorithm is the first that combines the benefits of streaming and subsampling in a novel way in order to truly scale submodular maximization to massive machine learning problems. To showcase its practicality, we empirically evaluated the performance of our algorithm on a video summarization application and observed that it outperforms the state-of-the-art algorithm by up to fifty-fold while maintaining practically the same utility. We also evaluated the scalability of our algorithm on a large dataset of Uber pick up locations.
AB - In this paper, we develop the first one-pass streaming algorithm for submodular maximization that does not evaluate the entire stream even once. By carefully subsampling each element of the data stream, our algorithm enjoys the tightest approximation guarantees in various settings while having the smallest memory footprint and requiring the lowest number of function evaluations. More specifically, for a monotone submodular function and a p-matchoid constraint, our randomized algorithm achieves a 4p approximation ratio (in expectation) with O(k) memory and O(km/p) queries per element (k is the size of the largest feasible solution and m is the number of matroids used to define the constraint). For the non-monotone case, our approximation ratio increases only slightly to 4p + 2 o(1). To the best or our knowledge, our algorithm is the first that combines the benefits of streaming and subsampling in a novel way in order to truly scale submodular maximization to massive machine learning problems. To showcase its practicality, we empirically evaluated the performance of our algorithm on a video summarization application and observed that it outperforms the state-of-the-art algorithm by up to fifty-fold while maintaining practically the same utility. We also evaluated the scalability of our algorithm on a large dataset of Uber pick up locations.
UR - http://www.scopus.com/inward/record.url?scp=85064824551&partnerID=8YFLogxK
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.conferencearticle???
AN - SCOPUS:85064824551
SN - 1049-5258
VL - 2018-December
SP - 732
EP - 742
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
Y2 - 2 December 2018 through 8 December 2018
ER -