تخطي إلى التنقل الرئيسي تخطي إلى البحث تخطي إلى المحتوى الرئيسي

Breaking the Language Barrier: Can Direct Inference Outperform Pre-Translation in Multilingual LLM Applications?

  • Yotam Intrator
  • , Matan Halfon
  • , Roman Goldenberg
  • , Reut Tsarfaty
  • , Matan Eyal
  • , Ehud Rivlin
  • , Yossi Matias
  • , Natalia Aizenberg

نتاج البحث: فصل من :كتاب / تقرير / مؤتمرمنشور من مؤتمرمراجعة النظراء

ملخص

Large language models hold significant promise in multilingual applications. However, inherent biases stemming from predominantly English-centric pre-training have led to the widespread practice of pre-translation, i.e., translating non-English inputs to English before inference, leading to complexity and information loss. This study re-evaluates the need for pre-translation in the context of PaLM2 models (Anil et al., 2023), which have been established as highly performant in multilingual tasks. We offer a comprehensive investigation across 108 languages and 6 diverse benchmarks, including open-end generative tasks, which were excluded from previous similar studies. Our findings challenge the pre-translation paradigm established in prior research, highlighting the advantages of direct inference in PaLM2. Specifically, PaLM2-L consistently outperforms pre-translation in 94 out of 108 languages. These findings pave the way for more efficient and effective multilingual applications, alleviating the limitations associated with pre-translation and unlocking linguistic authenticity.

اللغة الأصليةالإنجليزيّة
عنوان منشور المضيفProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics
العنوان الفرعي لمنشور المضيفHuman Language Technologies, NAACL 2024
ناشرAssociation for Computational Linguistics (ACL)
الصفحات829-844
عدد الصفحات16
رقم المعيار الدولي للكتب (الإلكتروني)9798891761155
المعرِّفات الرقمية للأشياء
حالة النشرنُشِر - 2024
منشور خارجيًانعم
الحدث2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2024 - Hybrid, Mexico City, المكسيك
المدة: ١٦ يونيو ٢٠٢٤٢١ يونيو ٢٠٢٤

سلسلة المنشورات

الاسمProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2024
مستوى الصوت2

!!Conference

!!Conference2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2024
الدولة/الإقليمالمكسيك
المدينةHybrid, Mexico City
المدة١٦/٠٦/٢٤٢١/٠٦/٢٤

ملاحظة ببليوغرافية

Publisher Copyright:
© 2024 Association for Computational Linguistics.

بصمة

أدرس بدقة موضوعات البحث “Breaking the Language Barrier: Can Direct Inference Outperform Pre-Translation in Multilingual LLM Applications?'. فهما يشكلان معًا بصمة فريدة.

قم بذكر هذا