METRICBERT: TEXT REPRESENTATION LEARNING VIA SELF-SUPERVISED TRIPLET TRAINING

Itzik Malkiel, Dvir Ginzburg, Oren Barkan, Avi Caciularu, Yoni Weill, Noam Koenigstein

نتاج البحث: فصل من :كتاب / تقرير / مؤتمرمنشور من مؤتمرمراجعة النظراء

ملخص

We present MetricBERT, a BERT-based model that learns to embed text under a well-defined similarity metric while simultaneously adhering to the “traditional” masked-language task. We focus on downstream tasks of learning similarities for recommendations where we show that MetricBERT outperforms state-of-the-art alternatives, sometimes by a substantial margin. We conduct extensive evaluations of our method and its different variants, showing that our training objective is highly beneficial over a traditional contrastive loss, a standard cosine similarity objective, and six other baselines. As an additional contribution, we publish a dataset of video games descriptions along with a test set of similarity annotations crafted by a domain expert.

اللغة الأصليةالإنجليزيّة
عنوان منشور المضيف2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings
ناشرInstitute of Electrical and Electronics Engineers Inc.
الصفحات8142-8146
عدد الصفحات5
رقم المعيار الدولي للكتب (الإلكتروني)9781665405409
المعرِّفات الرقمية للأشياء
حالة النشرنُشِر - 2022
الحدث47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Virtual, Online, سنغافورة
المدة: ٢٣ مايو ٢٠٢٢٢٧ مايو ٢٠٢٢

سلسلة المنشورات

الاسمICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
مستوى الصوت2022-May
رقم المعيار الدولي للدوريات (المطبوع)1520-6149

!!Conference

!!Conference47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022
الدولة/الإقليمسنغافورة
المدينةVirtual, Online
المدة٢٣/٠٥/٢٢٢٧/٠٥/٢٢

ملاحظة ببليوغرافية

Publisher Copyright:
© 2022 IEEE

قم بذكر هذا