Skip to content

mbzuai-nlp/arab_culture

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

ArabCulture 🇦🇪🇵🇸🇪🇬🇸🇦🇾🇪🇯🇴🇱🇧🇸🇾🇸🇩🇲🇦🇩🇿🇹🇳🇱🇾

Abdelrahman Sadallah and Junior Cedric Tonga and Khalid Almubarak and Saeed Almheiri and Farah Atif and Cahtrine Qwaider and Karima Kadaoui and Sara Shatnawi and Yaser Alesh and Fajri Koto

MBZUAI, SDAIA, Al-Balqa Applied University, Khalifa University


arXiv Hugging Face Organization License

🔥 News

  • [2025-02.18] The preprint of our paper can be found arXiv.
  • [2024-05.16] ArabCulture has been accepted at ACL Main, 2025. See you in Vienna!
  • [2024-05.22] ArabCulture dataset is available at HuggingFace
  • [2024-05.26] ArabCulture Benchmark is available at lm-eval-harness under tasks: arab_culture, and arab_culture_completion

ArabCulture is a culturally grounded commonsense reasoning dataset in Modern Standard Arabic (MSA), covering 13 Arab countries across the Gulf, Levant, North Africa, and the Nile Valley. The dataset contains 3,482 multiple-choice instances that test cultural commonsense reasoning in real-world daily life situations.

Logo

Dataset Summary

Despite the rise of Arabic LLMs, evaluation on culturally relevant reasoning has been limited. ArabCulture fills this gap with questions authored and validated by native speakers, reflecting social norms, traditions, and everyday knowledge in Arabic societies. Each instance presents a short scenario followed by three plausible sentence completions, only one of which is culturally accurate.

  • Language: Modern Standard Arabic (MSA)
  • Countries Covered: 13 (KSA, UAE, Yemen, Jordan, Lebanon, Syria, Palestine, Egypt, Sudan, Morocco, Algeria, Tunisia, Libya)
  • Domains: 12 (e.g., food, holidays, parenting)
  • Subtopics: 54
  • Instances: 3,482
  • Task Type: Multiple Choice (MCQ) and Sentence Completion
  • Cultural Context: Region and country-specific

Supported Tasks and Leaderboards

  • Commonsense Reasoning
  • Cultural Knowledge Assessment
  • Zero-shot Evaluation of LLMs

This dataset can be used to benchmark how well Arabic or multilingual models reason within culturally specific contexts.

Citation

@misc{sadallah2025arabculture,
  title={Commonsense Reasoning in Arab Culture},
  author={Abdelrahman Sadallah and Junior Cedric Tonga and Khalid Almubarak and Saeed Almheiri and Farah Atif and Cahtrine Qwaider and Karima Kadaoui and Sara Shatnawi and Yaser Alesh and Fajri Koto},
  year={2025},
  eprint={2502.12788},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}

License

The ArabCulture dataset is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Contact

For questions or contributions, contact:

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published