Multilingual Climate Messaging Dataset (English, Hausa, Yoruba, Igbo)

Description

This dataset is a multilingual parallel corpus of 715 climate-related public messages in English, Hausa, Yoruba, and Igbo. It was created to support inclusive climate communication and low-resource NLP research for African languages. The English messages were generated using AI-assisted, template-based prompts reflecting real-world scenarios such as flooding, heatwaves, and air pollution. Native speakers then translated each message into Hausa, Yoruba, and Igbo, ensuring cultural and linguistic accuracy. These translations were also reviewed by external language professionals for quality assurance.

This dataset was developed as a project by the Computational Linguistics and Language fellows of the 2025 Lune Two AI.Humanities.Social Sciences Research Fellowship for graduate students in Nigeria, organised by Research Round.

Poster for Indaba

Key Information

Languages: English, Hausa, Yoruba, Igbo

Total Entries: 715

Format: CSV

Domains Covered: Floods, Heatwaves, Waste Disposal, Air Pollution, General Climate Awareness

Intended Use: NLP research (translation, clustering, multilingual embeddings), climate communication, dataset augmentation

Data Collection: No scraping or personal data involved. All messages are fictional or generalised public communication types.

Ethical Notes: The dataset does not contain personally identifiable information. No data was scraped or collected from private sources. Translations were done with a high consideration for linguistic and cultural sensitivity.

License: CC BY 4.0 (Creative Commons Attribution 4.0 International)

Country of Origin: Nigeria

Authors: Olubusolami Sogunle, Habib Sani Yahaya, Chukwuma Oluebube Peace

Affiliation: Research Round

Motivation

The dataset was built to demonstrate how critical climate information can be made more accessible to non-English speakers in Nigeria. It also supports language technology development in low-resource African languages, offering a valuable resource for both practical and academic purposes. Through this dataset, we hope to set a precedent for the creation of similar datasets in other low-resourced Nigerian and African languages, promoting broader linguistic inclusivity in climate adaptation.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Combined Climate Messaging Dataset - for public - Dataset.csv		Combined Climate Messaging Dataset - for public - Dataset.csv
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multilingual Climate Messaging Dataset (English, Hausa, Yoruba, Igbo)

Description

Poster for Indaba

Key Information

Motivation

About

Uh oh!

Releases

Packages

Olubusolami-R/multilingual-climate-messaging-dataset

Folders and files

Latest commit

History

Repository files navigation

Multilingual Climate Messaging Dataset (English, Hausa, Yoruba, Igbo)

Description

Poster for Indaba

Key Information

Motivation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages