A Learning-based Approach for Explaining Language Models

Oren Barkan, Yonatan Toib, Yehonatan Elisha, Noam Koenigstein

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We present Learning Attributions (LA), a novel method for explaining language models. The core idea behind LA is to train a dedicated attribution model that functions as a surrogate explainer for the language model. This attribution model is designed to identify which tokens are most influential in driving the model's predictions. By optimizing the attribution model to mask the minimal amount of information necessary to induce substantial changes in the language model's output, LA provides a mechanism to understand which tokens in the input are critical for the model's decisions. We demonstrate the effectiveness of LA across several language models, highlighting its superiority over multiple state-of-the-art explanation methods across various datasets and evaluation metrics.

Original languageEnglish
Title of host publicationCIKM 2024 - Proceedings of the 33rd ACM International Conference on Information and Knowledge Management
PublisherAssociation for Computing Machinery
Pages98-108
Number of pages11
ISBN (Electronic)9798400704369
DOIs
StatePublished - 21 Oct 2024
Event33rd ACM International Conference on Information and Knowledge Management, CIKM 2024 - Boise, United States
Duration: 21 Oct 202425 Oct 2024

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings
ISSN (Print)2155-0751

Conference

Conference33rd ACM International Conference on Information and Knowledge Management, CIKM 2024
Country/TerritoryUnited States
CityBoise
Period21/10/2425/10/24

Bibliographical note

Publisher Copyright:
© 2024 ACM.

Keywords

  • deep learning
  • explainable ai
  • natural language processing

Fingerprint

Dive into the research topics of 'A Learning-based Approach for Explaining Language Models'. Together they form a unique fingerprint.

Cite this