Data-driven morphological analysis and disambiguation for morphologically rich languages and universal dependencies

Amir More, Reut Tsarfaty

نتاج البحث: فصل من :كتاب / تقرير / مؤتمرمنشور من مؤتمرمراجعة النظراء

ملخص

Parsing texts into universal dependencies (UD) in realistic scenarios requires infrastructure for morphological analysis and disambiguation (MA&D) of typologically different languages as a first tier. MA&D is particularly challenging in morphologically rich languages (MRLs), where the ambiguous space-delimited tokens ought to be disambiguated with respect to their constituent morphemes. Here we present a novel, language-agnostic, framework for MA&D, based on a transition system with two variants, word-based and morpheme-based, and a dedicated transition to mitigate the biases of variable-length morpheme sequences. Our experiments on a Modern Hebrew case study outperform the state of the art, and we show that the morpheme-based MD consistently outperforms our word-based variant. We further illustrate the utility and multilingual coverage of our framework by morphologically analyzing and disambiguating the large set of languages in the UD treebanks.

اللغة الأصليةالإنجليزيّة
عنوان منشور المضيفCOLING 2016 - 26th International Conference on Computational Linguistics, Proceedings of COLING 2016
العنوان الفرعي لمنشور المضيفTechnical Papers
ناشرAssociation for Computational Linguistics, ACL Anthology
الصفحات337-348
عدد الصفحات12
رقم المعيار الدولي للكتب (المطبوع)9784879747020
حالة النشرنُشِر - 2016
الحدث26th International Conference on Computational Linguistics, COLING 2016 - Osaka, اليابان
المدة: ١١ ديسمبر ٢٠١٦١٦ ديسمبر ٢٠١٦

سلسلة المنشورات

الاسمCOLING 2016 - 26th International Conference on Computational Linguistics, Proceedings of COLING 2016: Technical Papers

!!Conference

!!Conference26th International Conference on Computational Linguistics, COLING 2016
الدولة/الإقليماليابان
المدينةOsaka
المدة١١/١٢/١٦١٦/١٢/١٦

ملاحظة ببليوغرافية

Publisher Copyright:
© 1963-2018 ACL.

بصمة

أدرس بدقة موضوعات البحث “Data-driven morphological analysis and disambiguation for morphologically rich languages and universal dependencies'. فهما يشكلان معًا بصمة فريدة.

قم بذكر هذا