ملخص
Current parameters of accurate unlexicalized parsers based on Probabilistic Context-Free Grammars (PCFGs) form a two-dimensional grid in which rewrite events are conditioned on both horizontal (head-outward) and vertical (parental) histories. In Semitic languages, where arguments may move around rather freely and phrase-structures are often shallow, there are additional morphological factors that govern the generation process. Here we propose that agreement features percolated up the parse-tree form a third dimension of parametrization that is orthogonal to the previous two. This dimension differs from mere “state-splits” as it applies to a whole set of categories rather than to individual ones and encodes linguistically motivated co-occurrences between them. This paper presents extensive experiments with extensions of unlexicalized PCFGs for parsing Modern Hebrew in which tuning the parameters in three dimensions gradually leads to improved performance. Our best result introduces a new, stronger, lower bound on the performance of treebank grammars for parsing Modern Hebrew, and is on a par with current results for parsing Modern Standard Arabic obtained by a fully lexicalized parser trained on a much larger treebank.
اللغة الأصلية | الإنجليزيّة |
---|---|
الصفحات | 156-167 |
عدد الصفحات | 12 |
حالة النشر | نُشِر - 2007 |
منشور خارجيًا | نعم |
الحدث | 10th International Conference on Parsing Technologies, IWPT 2007 - Prague, التشيك المدة: ٢٣ يونيو ٢٠٠٧ → ٢٤ يونيو ٢٠٠٧ |
!!Conference
!!Conference | 10th International Conference on Parsing Technologies, IWPT 2007 |
---|---|
الدولة/الإقليم | التشيك |
المدينة | Prague |
المدة | ٢٣/٠٦/٠٧ → ٢٤/٠٦/٠٧ |
ملاحظة ببليوغرافية
Publisher Copyright:© 2007 Association for Computational Linguistics.