Representations and architectures in neural sentiment analysis for morphologically rich languages: A case study from modern Hebrew

Adam Amram, Anat Ben David, Reut Tsarfaty

פרסום מחקרי: פרק בספר / בדוח / בכנספרסום בספר כנסביקורת עמיתים

תקציר

This paper empirically studies the effects of representation choices on neural sentiment analysis for Modern Hebrew, a morphologically rich language (MRL) for which no sentiment analyzer currently exists. We study two dimensions of representational choices: (i) the granularity of the input signal (token-based vs. morpheme-based), and (ii) the level of encoding of vocabulary items (string-based vs. character-based). We hypothesise that for MRLs, languages where multiple meaning-bearing elements may be carried by a single space-delimited token, these choices will have measurable effects on task perfromance, and that these effects may vary for different architectural designs: fully-connected, convolutional or recurrent. Specifically, we hypothesize that morpheme-based representations will have advantages in terms of their generalization capacity and task accuracy, due to their better OOV coverage. To empirically study these effects, we develop a new sentiment analysis benchmark for Hebrew, based on 12K social media comments, and provide two instances thereof: token-based and morpheme-based. Our experiments show that the effect of representational choices vary with architectural types. While fully-connected and convolutional networks slightly prefer token-based settings, RNNs benefit from a morpheme-based representation, in accord with the hypothesis that explicit morphological information may help generalize. Our endeavor also delivers the first state-of-the-art broad-coverage sentiment analyzer for Hebrew, with over 89% accuracy, alongside an established benchmark to further study the effects of linguistic representation choices on neural networks’ task performance.

שפה מקוריתאנגלית
כותר פרסום המארחCOLING 2018 - 27th International Conference on Computational Linguistics, Proceedings
עורכיםEmily M. Bender, Leon Derczynski, Pierre Isabelle
מוציא לאורAssociation for Computational Linguistics (ACL)
עמודים2242-2252
מספר עמודים11
מסת"ב (אלקטרוני)9781948087506
סטטוס פרסוםפורסם - 2018
אירוע27th International Conference on Computational Linguistics, COLING 2018 - Santa Fe, ארצות הברית
משך הזמן: 20 אוג׳ 201826 אוג׳ 2018

סדרות פרסומים

שםCOLING 2018 - 27th International Conference on Computational Linguistics, Proceedings

כנס

כנס27th International Conference on Computational Linguistics, COLING 2018
מדינה/אזורארצות הברית
עירSanta Fe
תקופה20/08/1826/08/18

הערה ביבליוגרפית

Publisher Copyright:
© 2018 COLING 2018 - 27th International Conference on Computational Linguistics, Proceedings. All rights reserved.

טביעת אצבע

להלן מוצגים תחומי המחקר של הפרסום 'Representations and architectures in neural sentiment analysis for morphologically rich languages: A case study from modern Hebrew'. יחד הם יוצרים טביעת אצבע ייחודית.

פורמט ציטוט ביבליוגרפי