Skip to content

A multilingual Climate Messaging Dataset for Low-Resource Nigerian Languages which I co-created. Accepted for presentation at Deep Learning Indaba 2025.

Notifications You must be signed in to change notification settings

Olubusolami-R/multilingual-climate-messaging-dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 

Repository files navigation

Multilingual Climate Messaging Dataset (English, Hausa, Yoruba, Igbo)

Description

This dataset is a multilingual parallel corpus of 715 climate-related public messages in English, Hausa, Yoruba, and Igbo. It was created to support inclusive climate communication and low-resource NLP research for African languages. The English messages were generated using AI-assisted, template-based prompts reflecting real-world scenarios such as flooding, heatwaves, and air pollution. Native speakers then translated each message into Hausa, Yoruba, and Igbo, ensuring cultural and linguistic accuracy. These translations were also reviewed by external language professionals for quality assurance.

This dataset was developed as a project by the Computational Linguistics and Language fellows of the 2025 Lune Two AI.Humanities.Social Sciences Research Fellowship for graduate students in Nigeria, organised by Research Round.

Poster for Indaba

Indaba Poster (1)

Key Information

Languages: English, Hausa, Yoruba, Igbo

Total Entries: 715

Format: CSV

Domains Covered: Floods, Heatwaves, Waste Disposal, Air Pollution, General Climate Awareness

Intended Use: NLP research (translation, clustering, multilingual embeddings), climate communication, dataset augmentation

Data Collection: No scraping or personal data involved. All messages are fictional or generalised public communication types.

Ethical Notes: The dataset does not contain personally identifiable information. No data was scraped or collected from private sources. Translations were done with a high consideration for linguistic and cultural sensitivity.

License: CC BY 4.0 (Creative Commons Attribution 4.0 International)

Country of Origin: Nigeria

Authors: Olubusolami Sogunle, Habib Sani Yahaya, Chukwuma Oluebube Peace

Affiliation: Research Round

Motivation

The dataset was built to demonstrate how critical climate information can be made more accessible to non-English speakers in Nigeria. It also supports language technology development in low-resource African languages, offering a valuable resource for both practical and academic purposes. Through this dataset, we hope to set a precedent for the creation of similar datasets in other low-resourced Nigerian and African languages, promoting broader linguistic inclusivity in climate adaptation.

About

A multilingual Climate Messaging Dataset for Low-Resource Nigerian Languages which I co-created. Accepted for presentation at Deep Learning Indaba 2025.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published