Skip to content

Classify Emails

Panagiotis Antoniadis edited this page Jun 23, 2019 · 1 revision

One way to create specific language models for different categories of emails is to classify them using a Greek topic classifier. In a 2018 Google Summer of Code project, a Greek topic classifier was implemented as a part of integrating the Greek language into Spacy. The output categories are Sports, Greece, Science, World News, Economy, Environment, Politics, Art, Health.

The classification.py tool classifies fetched emails in these categories using the API of the classifier:

Usage:

$ python classification.py -h
usage: classification.py [-h] --input INPUT --output OUTPUT

Classify emails in predefined categories. More info on the classifier here:
https://github.com/eellak/nlpbuddy/wiki/Category-prediction

optional arguments:
  -h, --help       show this help message and exit

required arguments:
  --input INPUT    Input directory
  --output OUTPUT  Output directory

The results of the classification were not good enough since the categories are not representative of emails that people usually send. Therefore, clustering methods will be used that are described here.