תקציר
We present Learning Attributions (LA), a novel method for explaining language models. The core idea behind LA is to train a dedicated attribution model that functions as a surrogate explainer for the language model. This attribution model is designed to identify which tokens are most influential in driving the model's predictions. By optimizing the attribution model to mask the minimal amount of information necessary to induce substantial changes in the language model's output, LA provides a mechanism to understand which tokens in the input are critical for the model's decisions. We demonstrate the effectiveness of LA across several language models, highlighting its superiority over multiple state-of-the-art explanation methods across various datasets and evaluation metrics.
שפה מקורית | אנגלית |
---|---|
כותר פרסום המארח | CIKM 2024 - Proceedings of the 33rd ACM International Conference on Information and Knowledge Management |
מוציא לאור | Association for Computing Machinery |
עמודים | 98-108 |
מספר עמודים | 11 |
מסת"ב (אלקטרוני) | 9798400704369 |
מזהי עצם דיגיטלי (DOIs) | |
סטטוס פרסום | פורסם - 21 אוק׳ 2024 |
אירוע | 33rd ACM International Conference on Information and Knowledge Management, CIKM 2024 - Boise, ארצות הברית משך הזמן: 21 אוק׳ 2024 → 25 אוק׳ 2024 |
סדרות פרסומים
שם | International Conference on Information and Knowledge Management, Proceedings |
---|---|
ISSN (מודפס) | 2155-0751 |
כנס
כנס | 33rd ACM International Conference on Information and Knowledge Management, CIKM 2024 |
---|---|
מדינה/אזור | ארצות הברית |
עיר | Boise |
תקופה | 21/10/24 → 25/10/24 |
הערה ביבליוגרפית
Publisher Copyright:© 2024 ACM.