Sangīn

The files in this repository provide for the training and evaluation of a machine learning model that is meant to detect the meter of a hemistich of classical Persian poetry. A recent evaluation run yielded the following results:

Accuracy: 0.9801
F1: 0.9792
Precision: 0.9789
Recall: 0.9801
Loss: 0.0996

Base model

I chose XLM-RoBERTa from Facebook AI. (Is there a better, more recently developed option? If so, I would be happy to switch to it.)

Training data

The data used here comes from Ganjoor. So far, I have added to the dataset the following works, representing a total of 277,248 unique hemistichs:

The complete ghazals of Ṣāʾib Tabrīzī
The complete ghazals of Ḥāfiẓ
The complete ghazals of Saʿdī
All the ghazals in the Dīvān-i Shams of Rūmī (excepting some with obscure meters)
The first daftar of the Maṡnavī of Rūmī
A few thousand lines of the Shāhnāma of Firdawsī
Four of the poems in the Khamsa of Niẓāmī: Laylī u Majnūn, Khusraw va Shīrīn, the Haft paykar, and the Makhzan al-asrār

More should still be added, but this is a start. The model is already quite good at detecting any of the common meters. Some meters are rare enough that they almost never appear in classical Persian poetry, let alone in this training set.

Plans

Get the model properly versioned and published
Add to the training data and adjust training parameters to improve performance
Present this work at a conference or workshop or similar (if anyone has suggestions...)
Deploy an inference server and a web front end, i.e., a web app where a user could paste one or more hemistichs from a given poem and have the meter detected (see consensus.py for an idea of how this would work, looking for a consensus of high-confidence meter predictions)

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
archive		archive
data		data
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
build_csv.py		build_csv.py
consensus.py		consensus.py
eval.py		eval.py
ganjoor_schema.json		ganjoor_schema.json
hemistichs.csv.tar.zst		hemistichs.csv.tar.zst
infer.py		infer.py
label_map.json		label_map.json
pyproject.toml		pyproject.toml
scrape.py		scrape.py
train.py		train.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sangīn

Base model

Training data

Plans

About

Uh oh!

Releases

Packages

Languages

ganjoor/sangin

Folders and files

Latest commit

History

Repository files navigation

Sangīn

Base model

Training data

Plans

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages