Abdelrahman Sadallah and Junior Cedric Tonga and Khalid Almubarak and Saeed Almheiri and Farah Atif and Cahtrine Qwaider and Karima Kadaoui and Sara Shatnawi and Yaser Alesh and Fajri Koto
- [2025-02.18] The preprint of our paper can be found arXiv.
- [2024-05.16] ArabCulture has been accepted at ACL Main, 2025. See you in Vienna!
- [2024-05.22] ArabCulture dataset is available at HuggingFace
- [2024-05.26] ArabCulture Benchmark is available at lm-eval-harness under tasks:
arab_culture
, andarab_culture_completion
ArabCulture is a culturally grounded commonsense reasoning dataset in Modern Standard Arabic (MSA), covering 13 Arab countries across the Gulf, Levant, North Africa, and the Nile Valley. The dataset contains 3,482 multiple-choice instances that test cultural commonsense reasoning in real-world daily life situations.
Despite the rise of Arabic LLMs, evaluation on culturally relevant reasoning has been limited. ArabCulture fills this gap with questions authored and validated by native speakers, reflecting social norms, traditions, and everyday knowledge in Arabic societies. Each instance presents a short scenario followed by three plausible sentence completions, only one of which is culturally accurate.
- Language: Modern Standard Arabic (MSA)
- Countries Covered: 13 (KSA, UAE, Yemen, Jordan, Lebanon, Syria, Palestine, Egypt, Sudan, Morocco, Algeria, Tunisia, Libya)
- Domains: 12 (e.g., food, holidays, parenting)
- Subtopics: 54
- Instances: 3,482
- Task Type: Multiple Choice (MCQ) and Sentence Completion
- Cultural Context: Region and country-specific
- Commonsense Reasoning
- Cultural Knowledge Assessment
- Zero-shot Evaluation of LLMs
This dataset can be used to benchmark how well Arabic or multilingual models reason within culturally specific contexts.
@misc{sadallah2025arabculture,
title={Commonsense Reasoning in Arab Culture},
author={Abdelrahman Sadallah and Junior Cedric Tonga and Khalid Almubarak and Saeed Almheiri and Farah Atif and Cahtrine Qwaider and Karima Kadaoui and Sara Shatnawi and Yaser Alesh and Fajri Koto},
year={2025},
eprint={2502.12788},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
The ArabCulture dataset is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
For questions or contributions, contact:
-
Abdelrahman Sadallah ([email protected])
-
Fajri Koto ([email protected])