Three-dimensional parametrization for parsing morphologically rich languages

Reut Tsarfaty, Khalil Sima’an

نتاج البحث: نتاج بحثي من مؤتمرمحاضرةمراجعة النظراء


Current parameters of accurate unlexicalized parsers based on Probabilistic Context-Free Grammars (PCFGs) form a two-dimensional grid in which rewrite events are conditioned on both horizontal (head-outward) and vertical (parental) histories. In Semitic languages, where arguments may move around rather freely and phrase-structures are often shallow, there are additional morphological factors that govern the generation process. Here we propose that agreement features percolated up the parse-tree form a third dimension of parametrization that is orthogonal to the previous two. This dimension differs from mere “state-splits” as it applies to a whole set of categories rather than to individual ones and encodes linguistically motivated co-occurrences between them. This paper presents extensive experiments with extensions of unlexicalized PCFGs for parsing Modern Hebrew in which tuning the parameters in three dimensions gradually leads to improved performance. Our best result introduces a new, stronger, lower bound on the performance of treebank grammars for parsing Modern Hebrew, and is on a par with current results for parsing Modern Standard Arabic obtained by a fully lexicalized parser trained on a much larger treebank.

اللغة الأصليةالإنجليزيّة
عدد الصفحات12
حالة النشرنُشِر - 2007
منشور خارجيًانعم
الحدث10th International Conference on Parsing Technologies, IWPT 2007 - Prague, التشيك
المدة: ٢٣ يونيو ٢٠٠٧٢٤ يونيو ٢٠٠٧


!!Conference10th International Conference on Parsing Technologies, IWPT 2007

ملاحظة ببليوغرافية

Publisher Copyright:
© 2007 Association for Computational Linguistics.


أدرس بدقة موضوعات البحث “Three-dimensional parametrization for parsing morphologically rich languages'. فهما يشكلان معًا بصمة فريدة.

قم بذكر هذا