A pointer network architecture for joint morphological segmentation and tagging

Amit Seker, Reut Tsarfaty

نتاج البحث: فصل من :كتاب / تقرير / مؤتمرمنشور من مؤتمرمراجعة النظراء

ملخص

Morphologically Rich Languages (MRLs) such as Arabic, Hebrew and Turkish often require Morphological Disambiguation (MD), i.e., the prediction of the correct morphological decomposition of tokens into morphemes, early in the pipeline. Neural MD may be addressed as a simple pipeline, where segmentation is followed by sequence tagging, or as an end-to-end model, predicting morphemes from raw tokens. Both approaches are suboptimal; the former is heavily prone to error propagation, and the latter does not enjoy explicit access to the basic processing units called morphemes. This paper offers an MD architecture that combines the symbolic knowledge of morphemes with the learning capacity of neural end-to-end modeling. We propose a new, general and easy-to-implement Pointer Network model where the input is a morphological lattice and the output is a sequence of indices pointing at a single disambiguated path of morphemes. We demonstrate the efficacy of the model on segmentation and tagging, for Hebrew and Turkish texts, based on their respective Universal Dependencies (UD) treebanks. Our experiments show that with complete lattices, our model outperforms all shared-task results on segmenting and tagging these languages. On the SPMRL treebank, our model outperforms all previously reported results for Hebrew MD in realistic scenarios.

اللغة الأصليةالإنجليزيّة
عنوان منشور المضيفFindings of the Association for Computational Linguistics Findings of ACL
العنوان الفرعي لمنشور المضيفEMNLP 2020
ناشرAssociation for Computational Linguistics (ACL)
الصفحات4368-4378
عدد الصفحات11
رقم المعيار الدولي للكتب (الإلكتروني)9781952148903
حالة النشرنُشِر - 2020
منشور خارجيًانعم
الحدثFindings of the Association for Computational Linguistics, ACL 2020: EMNLP 2020 - Virtual, Online
المدة: ١٦ نوفمبر ٢٠٢٠٢٠ نوفمبر ٢٠٢٠

سلسلة المنشورات

الاسمFindings of the Association for Computational Linguistics Findings of ACL: EMNLP 2020

!!Conference

!!ConferenceFindings of the Association for Computational Linguistics, ACL 2020: EMNLP 2020
المدينةVirtual, Online
المدة١٦/١١/٢٠٢٠/١١/٢٠

ملاحظة ببليوغرافية

Publisher Copyright:
© 2020 Association for Computational Linguistics

بصمة

أدرس بدقة موضوعات البحث “A pointer network architecture for joint morphological segmentation and tagging'. فهما يشكلان معًا بصمة فريدة.

قم بذكر هذا