More constraints, smaller coresets: Constrained matrix approximation of sparse big data

Dan Feldman, Tamir Tassa

نتاج البحث: فصل من :كتاب / تقرير / مؤتمرمنشور من مؤتمرمراجعة النظراء

ملخص

We suggest a generic data reduction technique with provable guarantees for computing the low rank approximation of a matrix under some ℓz error, and constrained factorizations, such as the Non-negative Matrix Factorization (NMF). Our main algorithm reduces a given n × d matrix into a small, ε-dependent, weighted subset C of its rows (known as a coreset), whose size is independent of both n and d. We then prove that applying existing algorithms on the resulting coreset can be turned into (1 + ε)-approximations for the original (large) input matrix. In particular, we provide the first linear time approximation scheme (LTAS) for the rank-one NMF. The coreset C can be computed in parallel and using only one pass over a possibly unbounded stream of row vectors. In this sense we improve the result in [4] (Best paper of STOC 2013). Moreover, since C is a subset of these rows, its construction time, as well as its sparsity (number of non-zeroes entries) and the sparsity of the resulting low rank approximation depend on the maximum sparsity of an input row, and not on the actual dimension d. In this sense, we improve the result of Libery [21] (Best paper of KDD 2013) and answer affirmably, and in a more general setting, his open question of computing such a coreset. We implemented our coreset and demonstrate it by turning Matlab's NMF off-line function that gets a matrix in the memory of a single machine, into a streaming algorithm that runs in parallel on 64 machines on Amazon's cloud and returns sparse NMF factorization. Source code is provided for reproducing the experiments and integration with existing and future algorithms.

اللغة الأصليةالإنجليزيّة
عنوان منشور المضيفKDD 2015 - Proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining
ناشرAssociation for Computing Machinery
الصفحات249-258
عدد الصفحات10
رقم المعيار الدولي للكتب (الإلكتروني)9781450336642
المعرِّفات الرقمية للأشياء
حالة النشرنُشِر - 10 أغسطس 2015
الحدث21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2015 - Sydney, أستراليا
المدة: ١٠ أغسطس ٢٠١٥١٣ أغسطس ٢٠١٥

سلسلة المنشورات

الاسمProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
مستوى الصوت2015-August

!!Conference

!!Conference21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2015
الدولة/الإقليمأستراليا
المدينةSydney
المدة١٠/٠٨/١٥١٣/٠٨/١٥

ملاحظة ببليوغرافية

Publisher Copyright:
© 2015 ACM.

بصمة

أدرس بدقة موضوعات البحث “More constraints, smaller coresets: Constrained matrix approximation of sparse big data'. فهما يشكلان معًا بصمة فريدة.

قم بذكر هذا