These data sets were presented for NSURL-2019 task 7.
Training data has two parts: the first part is PEYMA corpus containing 300K tokens; the second part has 600K tokens.
Test data contains 150K tokens and was prepared for NSURL-2019.
For more information about the shared task and the results, please refer to the following paper:
InProceedings{Task7: NERFarsi,
Author = {Nasrin Taghizadeh and Zeinab Borhani-fard and Melika Golestani Pour and Mojgan Farhoodi and Maryam Mahmoudi and Masoumeh Azimzadeh and Heshaam Faili},
Title = {{NSURL}-2019 Task 7: Named Entity Recognition (NER) in Farsi},
Booktitle = {Proceedings of the first International Workshop on NLP Solutions for Under Resourced Languages},
series = {NSURL '19},
month = {September},
year = {2019},
address = {Trento, Italy}
}