Understanding the relations between entities denoted by NPs in a text is a critical part of human-like natural language understanding. However, only a fraction of such relations is covered by standard NLP tasks and benchmarks nowadays. In this work, we propose a novel task termed text-based NP enrichment (TNE), in which we aim to enrich each NP in a text with all the preposition-mediated relations—either explicit or implicit—that hold between it and other NPs in the text. The relations are represented as triplets, each denoted by two NPs related via a preposition. Humans recover such relations seamlessly, while current state-of-the-art models struggle with them due to the implicit nature of the problem. We build the first large-scale dataset for the problem, provide the formal framing and scope of annotation, analyze the data, and report the results of fine-tuned language models on the task, demonstrating the challenge it poses to current technology. A webpage with a data-exploration UI, a demo, and links to the code, models, and leaderboard, to foster further research into this challenging problem can be found at: yanaiela.github.io/TNE/.
|Number of pages||21|
|Journal||Transactions of the Association for Computational Linguistics|
|State||Published - 27 Jul 2022|
Bibliographical noteFunding Information:
We would like to thank the NLP-BIU lab, Nathan Schneider, and Yufang Hou for helpful discussions and comments on this paper. We also thank the anonymous reviewers and the action editors, Marie-Catherine de Marneffe and Mark Steedman, for their valuable suggestions. Yanai Elazar is grateful to be supported by the PBC fellowship for outstanding PhD candidates in Data Science and the Google PhD fellowship. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme, grant agreement no. 802774 (iEXTRACT) and grant agreement no. 677352 (NLPRO).
© MIT Press Journals. All rights reserved.