Abstract
Morphological tasks use large multi-lingual datasets that organize words into inflection tables, which then serve as training and evaluation data for various tasks. However, a closer inspection of these data reveals pro-found cross-linguistic inconsistencies, which arise from the lack of a clear linguistic and operational definition of what is a word, and which severely impair the universality of the derived tasks. To overcome this deficiency, we propose to view morphology as a clause-level phenomenon, rather than word-level. It is an-chored in a fixed yet inclusive set of features, that encapsulates all functions realized in a saturated clause. We deliver MIGHTYMORPH, a novel dataset for clause-level morphology covering 4 typologically different languages: English, German, Turkish, and Hebrew. We use this dataset to derive 3 clause-level morphological tasks: inflection, reinflection and analysis. Our experiments show that the clause-level tasks are substantially harder than the respective word-level tasks, while having comparable complexity across languages. Furthermore, redefining morphology to the clause-level provides a neat interface with contextualized language models (LMs) and allows assessing the morphological knowledge encoded in these models and their usabil-ity for morphological tasks. Taken together, this work opens up new horizons in the study of computational morphology, leaving ample space for studying neural morphology cross-linguistically.
Original language | English |
---|---|
Pages (from-to) | 1455-1472 |
Number of pages | 18 |
Journal | Transactions of the Association for Computational Linguistics |
Volume | 10 |
DOIs | |
State | Published - 2022 |
Externally published | Yes |
Bibliographical note
Publisher Copyright:© 2022 Association for Computational Linguistics. All rights reserved.