Three-dimensional parametrization for parsing morphologically rich languages

Reut Tsarfaty, Khalil Sima’an

פרסום מחקרי: תוצר מחקר מכנסהרצאהביקורת עמיתים

תקציר

Current parameters of accurate unlexicalized parsers based on Probabilistic Context-Free Grammars (PCFGs) form a two-dimensional grid in which rewrite events are conditioned on both horizontal (head-outward) and vertical (parental) histories. In Semitic languages, where arguments may move around rather freely and phrase-structures are often shallow, there are additional morphological factors that govern the generation process. Here we propose that agreement features percolated up the parse-tree form a third dimension of parametrization that is orthogonal to the previous two. This dimension differs from mere “state-splits” as it applies to a whole set of categories rather than to individual ones and encodes linguistically motivated co-occurrences between them. This paper presents extensive experiments with extensions of unlexicalized PCFGs for parsing Modern Hebrew in which tuning the parameters in three dimensions gradually leads to improved performance. Our best result introduces a new, stronger, lower bound on the performance of treebank grammars for parsing Modern Hebrew, and is on a par with current results for parsing Modern Standard Arabic obtained by a fully lexicalized parser trained on a much larger treebank.

שפה מקוריתאנגלית
עמודים156-167
מספר עמודים12
סטטוס פרסוםפורסם - 2007
פורסם באופן חיצוניכן
אירוע10th International Conference on Parsing Technologies, IWPT 2007 - Prague, צ'כיה
משך הזמן: 23 יוני 200724 יוני 2007

כנס

כנס10th International Conference on Parsing Technologies, IWPT 2007
מדינה/אזורצ'כיה
עירPrague
תקופה23/06/0724/06/07

הערה ביבליוגרפית

Publisher Copyright:
© 2007 Association for Computational Linguistics.

טביעת אצבע

להלן מוצגים תחומי המחקר של הפרסום 'Three-dimensional parametrization for parsing morphologically rich languages'. יחד הם יוצרים טביעת אצבע ייחודית.

פורמט ציטוט ביבליוגרפי