Morphology Without Borders: Clause-Level Morphology

Omer Goldman, Reut Tsarfaty

Research output: Contribution to journalArticlepeer-review

Abstract

Morphological tasks use large multi-lingual datasets that organize words into inflection tables, which then serve as training and evaluation data for various tasks. However, a closer inspection of these data reveals pro-found cross-linguistic inconsistencies, which arise from the lack of a clear linguistic and operational definition of what is a word, and which severely impair the universality of the derived tasks. To overcome this deficiency, we propose to view morphology as a clause-level phenomenon, rather than word-level. It is an-chored in a fixed yet inclusive set of features, that encapsulates all functions realized in a saturated clause. We deliver MIGHTYMORPH, a novel dataset for clause-level morphology covering 4 typologically different languages: English, German, Turkish, and Hebrew. We use this dataset to derive 3 clause-level morphological tasks: inflection, reinflection and analysis. Our experiments show that the clause-level tasks are substantially harder than the respective word-level tasks, while having comparable complexity across languages. Furthermore, redefining morphology to the clause-level provides a neat interface with contextualized language models (LMs) and allows assessing the morphological knowledge encoded in these models and their usabil-ity for morphological tasks. Taken together, this work opens up new horizons in the study of computational morphology, leaving ample space for studying neural morphology cross-linguistically.

Original languageEnglish
Pages (from-to)1455-1472
Number of pages18
JournalTransactions of the Association for Computational Linguistics
Volume10
DOIs
StatePublished - 2022
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2022 Association for Computational Linguistics. All rights reserved.

Fingerprint

Dive into the research topics of 'Morphology Without Borders: Clause-Level Morphology'. Together they form a unique fingerprint.

Cite this