LLM Explainability via Attributive Masking Learning

Oren Barkan, Yonatan Toib, Yehonatan Elisha, Jonathan Weill, Noam Koenigstein

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this paper, we introduce Attributive Masking Learning (AML), a method designed for explaining language model predictions by learning input masks.AML trains an attribution model to identify influential tokens in the input for a given language model's prediction.The central concept of AML is to train an auxiliary attribution model to simultaneously 1) mask as much input data as possible while ensuring that the language model's prediction closely aligns with its prediction on the original input, and 2) ensure a significant change in the model's prediction when applying the inverse (complement) of the same mask to the input.This dual-masking approach further enables the optimization of the explanation w.r.t.the metric of interest.We demonstrate the effectiveness of AML on both encoder-based and decoder-based language models, showcasing its superiority over a variety of state-of-the-art explanation methods on multiple benchmarks.Our code is available at: https://github.com/amlconf/aml.

Original languageEnglish
Title of host publicationEMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024
EditorsYaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
PublisherAssociation for Computational Linguistics (ACL)
Pages9522-9537
Number of pages16
ISBN (Electronic)9798891761681
StatePublished - 2024
Event2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024 - Hybrid, Miami, United States
Duration: 12 Nov 202416 Nov 2024

Publication series

NameEMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024

Conference

Conference2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024
Country/TerritoryUnited States
CityHybrid, Miami
Period12/11/2416/11/24

Bibliographical note

Publisher Copyright:
© 2024 Association for Computational Linguistics.

Fingerprint

Dive into the research topics of 'LLM Explainability via Attributive Masking Learning'. Together they form a unique fingerprint.

Cite this