تخطي إلى التنقل الرئيسي تخطي إلى البحث تخطي إلى المحتوى الرئيسي

The Truth, The Whole Truth, and Nothing but the Truth: A New Benchmark Dataset for Hebrew Text Credibility Assessment

  • Ben Hagag
  • , Reut Tsarfaty

نتاج البحث: فصل من :كتاب / تقرير / مؤتمرمنشور من مؤتمرمراجعة النظراء

ملخص

In the age of information overload, it is more important than ever to discern fact from fiction. From the internet to traditional media, we are constantly confronted with a deluge of information, much of which comes from politicians and other public figures who wield significant influence. In this paper, we introduce HeTrue: a new, publicly available dataset for evaluating the credibility of statements made by Israeli public figures and politicians. This dataset consists of 1021 statements, manually annotated by Israeli professional journalists, for their credibility status. Using this corpus, we set out to assess whether the credibility of statements can be predicted based on the text alone. To establish a baseline, we compare text-only methods with others using additional data like metadata, context, and evidence. Furthermore, we develop several credibility assessment models, including a feature-based model that utilizes linguistic features, and state-of-the-art transformer-based models with contextualized embeddings from a pre-trained encoder. Empirical results demonstrate improved performance when models integrate statement and context, outperforming those relying on the statement text alone. Our best model, which also integrates evidence, achieves a 48.3 F1 Score, suggesting that HeTrue is a challenging benchmark, calling for further work on this task.

اللغة الأصليةالإنجليزيّة
عنوان منشور المضيفFindings of the Association for Computational Linguistics
العنوان الفرعي لمنشور المضيفEMNLP 2023
ناشرAssociation for Computational Linguistics (ACL)
الصفحات3850-3865
عدد الصفحات16
رقم المعيار الدولي للكتب (الإلكتروني)9798891760615
المعرِّفات الرقمية للأشياء
حالة النشرنُشِر - 2023
منشور خارجيًانعم
الحدث2023 Findings of the Association for Computational Linguistics: EMNLP 2023 - Hybrid, سنغافورة
المدة: ٦ ديسمبر ٢٠٢٣١٠ ديسمبر ٢٠٢٣

سلسلة المنشورات

الاسمFindings of the Association for Computational Linguistics: EMNLP 2023

!!Conference

!!Conference2023 Findings of the Association for Computational Linguistics: EMNLP 2023
الدولة/الإقليمسنغافورة
المدينةHybrid
المدة٦/١٢/٢٣١٠/١٢/٢٣

ملاحظة ببليوغرافية

Publisher Copyright:
© 2023 Association for Computational Linguistics.

بصمة

أدرس بدقة موضوعات البحث “The Truth, The Whole Truth, and Nothing but the Truth: A New Benchmark Dataset for Hebrew Text Credibility Assessment'. فهما يشكلان معًا بصمة فريدة.

قم بذكر هذا