Exploring Language Patterns in Multilingual Climate Messages

Overview

During the LUNE-TWO Fellowship, I co-created a multilingual dataset of climate messages in English, Yoruba, Hausa, and Igbo alongside 2 linguists. I wanted to see what more could be learned from it, so I ran an experiment: what happens when we cluster these messages using embeddings? This project is my exploration of that question.

What I Did

Cleaned and tokenised the climate messages.
Used XLM-Roberta to generate sentence embeddings.
Applied K-Means clustering and tested cluster quality.
Visualised patterns with PCA and t-SNE.

What I Found

Yoruba and Igbo messages often clustered together, which makes sense given their shared Niger–Congo roots.
Hausa, from a different language family (Afro-Asiatic), formed its own group.
The clustering quality was solid (Silhouette Scores: Yoruba 0.605, Hausa 0.579, Igbo 0.618).
Visual plots gave a clear picture of these language relationships.

Why It Matters

This exercise shows how AI/ML techniques can reveal structure in underrepresented languages and hints at applications like:

Better multilingual climate communication.
Tools for translation, retrieval, or summarisation in African contexts.
Making crucial information more accessible across language barriers.

Next Steps

Try other models like AfriBERTa and mBERT.
Expand to more African languages.
Test downstream tasks such as classification or topic modelling.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
notebook.ipynb		notebook.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Exploring Language Patterns in Multilingual Climate Messages

Overview

What I Did

What I Found

Why It Matters

Next Steps

About

Uh oh!

Releases

Packages

Languages

Olubusolami-R/Exploring-Language-Patterns-in-Multilingual-Climate-Messages

Folders and files

Latest commit

History

Repository files navigation

Exploring Language Patterns in Multilingual Climate Messages

Overview

What I Did

What I Found

Why It Matters

Next Steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages