Representations and architectures in neural sentiment analysis for morphologically rich languages: A case study from modern Hebrew

Adam Amram, Anat Ben David, Reut Tsarfaty

نتاج البحث: فصل من :كتاب / تقرير / مؤتمرمنشور من مؤتمرمراجعة النظراء

ملخص

This paper empirically studies the effects of representation choices on neural sentiment analysis for Modern Hebrew, a morphologically rich language (MRL) for which no sentiment analyzer currently exists. We study two dimensions of representational choices: (i) the granularity of the input signal (token-based vs. morpheme-based), and (ii) the level of encoding of vocabulary items (string-based vs. character-based). We hypothesise that for MRLs, languages where multiple meaning-bearing elements may be carried by a single space-delimited token, these choices will have measurable effects on task perfromance, and that these effects may vary for different architectural designs: fully-connected, convolutional or recurrent. Specifically, we hypothesize that morpheme-based representations will have advantages in terms of their generalization capacity and task accuracy, due to their better OOV coverage. To empirically study these effects, we develop a new sentiment analysis benchmark for Hebrew, based on 12K social media comments, and provide two instances thereof: token-based and morpheme-based. Our experiments show that the effect of representational choices vary with architectural types. While fully-connected and convolutional networks slightly prefer token-based settings, RNNs benefit from a morpheme-based representation, in accord with the hypothesis that explicit morphological information may help generalize. Our endeavor also delivers the first state-of-the-art broad-coverage sentiment analyzer for Hebrew, with over 89% accuracy, alongside an established benchmark to further study the effects of linguistic representation choices on neural networks’ task performance.

اللغة الأصليةالإنجليزيّة
عنوان منشور المضيفCOLING 2018 - 27th International Conference on Computational Linguistics, Proceedings
المحررونEmily M. Bender, Leon Derczynski, Pierre Isabelle
ناشرAssociation for Computational Linguistics (ACL)
الصفحات2242-2252
عدد الصفحات11
رقم المعيار الدولي للكتب (الإلكتروني)9781948087506
حالة النشرنُشِر - 2018
الحدث27th International Conference on Computational Linguistics, COLING 2018 - Santa Fe, الولايات المتّحدة
المدة: ٢٠ أغسطس ٢٠١٨٢٦ أغسطس ٢٠١٨

سلسلة المنشورات

الاسمCOLING 2018 - 27th International Conference on Computational Linguistics, Proceedings

!!Conference

!!Conference27th International Conference on Computational Linguistics, COLING 2018
الدولة/الإقليمالولايات المتّحدة
المدينةSanta Fe
المدة٢٠/٠٨/١٨٢٦/٠٨/١٨

ملاحظة ببليوغرافية

Funding Information:
We thank Tzipy Lazar-Shoef for research assistance, and are thankful to three anonymous reviewers for their insightful comments. This research is funded by the Israel Science Foundation, ISF grant 1739/26, for which we are grateful.

Publisher Copyright:
© 2018 COLING 2018 - 27th International Conference on Computational Linguistics, Proceedings. All rights reserved.

بصمة

أدرس بدقة موضوعات البحث “Representations and architectures in neural sentiment analysis for morphologically rich languages: A case study from modern Hebrew'. فهما يشكلان معًا بصمة فريدة.

قم بذكر هذا