Abstract
Political language is deeply intertwined with social identities. While social identities are often shaped by specific cultural contexts, existing NLP datasets are predominantly English-centric and focus on coarse-grained identity categories. We introduce HEBID, the first multilabel Hebrew corpus for social identity detection. The corpus contains 5,536 sentences from Israeli politicians’ Facebook posts (Dec 2018–Apr 2021), with each sentence manually annotated for twelve nuanced social identities (e.g., Rightist, Ultra-Orthodox, Socially-oriented) selected based on their salience in national survey data. We benchmark multilabel and single-label encoders alongside 2B–9B-parameter decoder LLMs, finding that Hebrew-tuned LLMs provide the best results (macro-F1 = 0.74). We apply our classifier to politicians’ Facebook posts and parliamentary speeches, evaluating differences in popularity, temporal trends, clustering patterns, and gender-related variations in identity expression. We utilize identity choices from a national public survey, comparing the identities portrayed in elite discourse with those prioritized by the public. HEBID provides a comprehensive foundation for studying social identities in Hebrew and can serve as a model for similar research in other non-English political contexts.1
| Original language | English |
|---|---|
| Title of host publication | EMNLP 2025 - 2025 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2025 |
| Editors | Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng |
| Publisher | Association for Computational Linguistics (ACL) |
| Pages | 9850-9870 |
| Number of pages | 21 |
| ISBN (Electronic) | 9798891763357 |
| DOIs | |
| State | Published - 2025 |
| Event | 30th Conference on Empirical Methods in Natural Language Processing, EMNLP 2025 - Suzhou, China Duration: 4 Nov 2025 → 9 Nov 2025 |
Publication series
| Name | EMNLP 2025 - 2025 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2025 |
|---|
Conference
| Conference | 30th Conference on Empirical Methods in Natural Language Processing, EMNLP 2025 |
|---|---|
| Country/Territory | China |
| City | Suzhou |
| Period | 4/11/25 → 9/11/25 |
Bibliographical note
Publisher Copyright:©2025 Association for Computational Linguistics.
Fingerprint
Dive into the research topics of 'HEBID: Detecting Social Identities in Hebrew-language Political Text'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver