דילוג לניווט ראשי דילוג לחיפוש דילוג לתוכן הראשי

Beyond N-Grams: Rethinking Evaluation Metrics and Strategies for Multilingual Abstractive Summarization

  • Itai Mondshine
  • , Tzuf Paz-Argaman
  • , Reut Tsarfaty

פרסום מחקרי: פרק בספר / בדוח / בכנספרסום בספר כנסביקורת עמיתים

תקציר

Automatic N-gram based metrics such as ROUGE are widely used for evaluating generative tasks such as summarization. While these metrics are considered indicative (even if imperfect), of human evaluation for English, their suitability for other languages remains unclear. To address this, in this paper we systematically assess evaluation metrics for generation - both n-gram-based and neural-based - to assess their effectiveness across languages and tasks. Specifically, we design a large-scale evaluation suite across eight languages from four typological families - agglutinative, isolating, low-fusional, and high-fusional - from both low- and high-resource languages, to analyze their correlations with human judgments. Our findings highlight the sensitivity of the evaluation metric to the language type at hand. For example, for fusional languages, n-gram-based metrics demonstrate a lower correlation with human assessments, compared to isolating and agglutinative languages. We also demonstrate that tokenization considerations can significantly mitigate this for fusional languages with rich morphology, up to reversing such negative correlations. Additionally, we show that neural-based metrics specifically trained for evaluation, such as COMET, consistently outperform other neural metrics and correlate better than n-grams metrics with human judgments in low-resource languages. Overall, our analysis highlights the limitations of n-gram metrics for fusional languages and advocates for investment in neural-based metrics trained for evaluation tasks.

שפה מקוריתאנגלית
כותר פרסום המארחLong Papers
עורכיםWanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
מוציא לאורAssociation for Computational Linguistics (ACL)
עמודים19019-19035
מספר עמודים17
מסת"ב (אלקטרוני)9798891762510
מזהי עצם דיגיטלי (DOIs)
סטטוס פרסוםפורסם - 2025
פורסם באופן חיצוניכן
אירוע63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025 - Vienna, אוסטריה
משך הזמן: 27 יולי 20251 אוג׳ 2025

סדרות פרסומים

שםProceedings of the Annual Meeting of the Association for Computational Linguistics
כרך1
ISSN (מודפס)0736-587X

כנס

כנס63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
מדינה/אזוראוסטריה
עירVienna
תקופה27/07/251/08/25

הערה ביבליוגרפית

Publisher Copyright:
© 2025 Association for Computational Linguistics.

טביעת אצבע

להלן מוצגים תחומי המחקר של הפרסום 'Beyond N-Grams: Rethinking Evaluation Metrics and Strategies for Multilingual Abstractive Summarization'. יחד הם יוצרים טביעת אצבע ייחודית.

פורמט ציטוט ביבליוגרפי