A collection of raw text from various Greek dialects. Contains data from the following dialects:
- Cypriot Greek
- Cretan Greek
- Pontic Greek
- Northern Greek
- Some part of the Modern Greek wikipedia
The repository contains data collected from the web and other textual resources (blogs, websites, theatrical plays among other things). The folder SMG_CG contains twitter data from Standard Modern Greek and Cypriot that have been originally collected by Hanna Sababa for her project A Classifier to Distinguish Between Cypriot Greek and Standard Modern Greek. Mr Sfakianakis is thanked from providing us with his Cretan translations of a number of Ancient Greek tragedies and comedies. The folder all_dialects contains a zip file that has the collection of data with minimal pre-processing and annotation for the respective dialect.