A single generative model for joint morphological segmentation and syntactic parsing

Yoav Goldberg, Reut Tsarfaty

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Morphological processes in Semitic languages deliver space-delimited words which introduce multiple, distinct, syntactic units into the structure of the input sentence. These words are in turn highly ambiguous, breaking the assumption underlying most parsers that the yield of a tree for a given sentence is known in advance. Here we propose a single joint model for performing both morphological segmentation and syntactic disambiguation which bypasses the associated circularity. Using a treebank grammar, a data-driven lexicon, and a linguistically motivated unknown-tokens handling technique our model outperforms previous pipelined, integrated or factorized systems for Hebrew morphological and syntactic processing, yielding an error reduction of 12% over the best published results so far.

Original languageEnglish
Title of host publicationACL-08
Subtitle of host publicationHLT - 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference
Pages371-379
Number of pages9
StatePublished - 2008
Externally publishedYes
Event46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-08: HLT - Columbus, OH, United States
Duration: 15 Jun 200820 Jun 2008

Publication series

NameACL-08: HLT - 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference

Conference

Conference46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-08: HLT
Country/TerritoryUnited States
CityColumbus, OH
Period15/06/0820/06/08

Fingerprint

Dive into the research topics of 'A single generative model for joint morphological segmentation and syntactic parsing'. Together they form a unique fingerprint.

Cite this