Mitigating Hallucinations in Large Vision-Language Models (LVLMs) via Language-Contrastive Decoding (LCD)

Avshalom Manevich, Reut Tsarfaty

نتاج البحث: فصل من :كتاب / تقرير / مؤتمرمنشور من مؤتمرمراجعة النظراء

ملخص

Large Vision-Language Models (LVLMs) are an extension of Large Language Models (LLMs) that facilitate processing both image and text inputs, expanding AI capabilities. However, LVLMs struggle with object hallucinations due to their reliance on text cues and learned object co-occurrence biases. While most research quantifies these hallucinations, mitigation strategies are still lacking. Our study introduces a Language Contrastive Decoding (LCD) algorithm that adjusts LVLM outputs based on LLM distribution confidence levels, effectively reducing object hallucinations. We demonstrate the advantages of LCD in leading LVLMs, showing up to 4% improvement in POPE F1 scores and up to 36% reduction in CHAIR scores on the COCO validation set, while also improving captioning quality scores. Our method effectively improves LVLMs without needing complex post-processing or retraining, and is easily applicable to different models. Our findings highlight the potential of further exploration of LVLM-specific decoding algorithms for improved multimodal performance.

اللغة الأصليةالإنجليزيّة
عنوان منشور المضيف62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024 - Proceedings of the Conference
المحررونLun-Wei Ku, Andre Martins, Vivek Srikumar
ناشرAssociation for Computational Linguistics (ACL)
الصفحات6008-6022
عدد الصفحات15
رقم المعيار الدولي للكتب (الإلكتروني)9798891760998
حالة النشرنُشِر - 2024
منشور خارجيًانعم
الحدثFindings of the 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024 - Hybrid, Bangkok, تايلند
المدة: ١١ أغسطس ٢٠٢٤١٦ أغسطس ٢٠٢٤

سلسلة المنشورات

الاسمProceedings of the Annual Meeting of the Association for Computational Linguistics
رقم المعيار الدولي للدوريات (المطبوع)0736-587X

!!Conference

!!ConferenceFindings of the 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024
الدولة/الإقليمتايلند
المدينةHybrid, Bangkok
المدة١١/٠٨/٢٤١٦/٠٨/٢٤

ملاحظة ببليوغرافية

Publisher Copyright:
© 2024 Association for Computational Linguistics.

بصمة

أدرس بدقة موضوعات البحث “Mitigating Hallucinations in Large Vision-Language Models (LVLMs) via Language-Contrastive Decoding (LCD)'. فهما يشكلان معًا بصمة فريدة.

قم بذكر هذا