This dataset is a multilingual parallel corpus of 715 climate-related public messages in English, Hausa, Yoruba, and Igbo. It was created to support inclusive climate communication and low-resource NLP research for African languages. The English messages were generated using AI-assisted, template-based prompts reflecting real-world scenarios such as flooding, heatwaves, and air pollution. Native speakers then translated each message into Hausa, Yoruba, and Igbo, ensuring cultural and linguistic accuracy. These translations were also reviewed by external language professionals for quality assurance.
This dataset was developed as a project by the Computational Linguistics and Language fellows of the 2025 Lune Two AI.Humanities.Social Sciences Research Fellowship for graduate students in Nigeria, organised by Research Round.
Languages: English, Hausa, Yoruba, Igbo
Total Entries: 715
Format: CSV
Domains Covered: Floods, Heatwaves, Waste Disposal, Air Pollution, General Climate Awareness
Intended Use: NLP research (translation, clustering, multilingual embeddings), climate communication, dataset augmentation
Data Collection: No scraping or personal data involved. All messages are fictional or generalised public communication types.
Ethical Notes: The dataset does not contain personally identifiable information. No data was scraped or collected from private sources. Translations were done with a high consideration for linguistic and cultural sensitivity.
License: CC BY 4.0 (Creative Commons Attribution 4.0 International)
Country of Origin: Nigeria
Authors: Olubusolami Sogunle, Habib Sani Yahaya, Chukwuma Oluebube Peace
Affiliation: Research Round
The dataset was built to demonstrate how critical climate information can be made more accessible to non-English speakers in Nigeria. It also supports language technology development in low-resource African languages, offering a valuable resource for both practical and academic purposes. Through this dataset, we hope to set a precedent for the creation of similar datasets in other low-resourced Nigerian and African languages, promoting broader linguistic inclusivity in climate adaptation.