Skip to content

Files

Latest commit

da2c0ee · Apr 3, 2024

History

History

session_61_assignment_7.11

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
Apr 3, 2024
Apr 3, 2024
Apr 3, 2024
Mar 30, 2024
Mar 30, 2024

Text Emoji Classification

This task solved with this repo

  • Text Classification: I used GloVe pre-trained embeddings to convert my text to numerical vectors.

How to install

Run this command:

pip install -r requirements.txt

Also you can download the GloVe pre-trained embeddings

wget http://nlp.stanford.edu/data/glove.6B.zips

How to train

Run this command:

python train.py --train-dataset YOUR_TRAIN_DATASET --test-dataset YOUR_TEST_DATASET \
--dimension DIMENSION_OF_FEATURE_VECTORS --vectors-file YOUR_FEATURE_VECTORS_FILE \
--epochs NUMBER_OF_EPOCHS

You can also see the other arguments of it with this command

python train.py --help

For Example:

  • --dropout, --no-dropout: You can add dropout layer to your network. default:False
  • --model-save: You change the best model name to save. default:best_emojis_classifier.keras
  • --save-plots, --no-save-plots: You can save the training information plots. default:True

How to test

Run this command:

python test.py --model YOUR_MODEL --vectors-file YOUR_FEATURE_VECTORS_FILE \
--sentence YOUR_SENTENCE

You can also see the other arguments of it with this command

python test.py --help

For Example:

  • --infer, --no-infer: Whether to inferences the model with your sentence or not. default:True
  • --n-infer: You can change number of inferences on your sentence. default:100

Benchmark

Without Dropout layer

Featue Vector Dimension Train Loss Train Accuracy Test Loss Test Accuracy Inference Time
50d 0.3673 0.9394 0.4503 0.8571 0.0686s
100d 0.3991 0.9470 0.4769 0.8593 0.0993s
200d 0.2039 0.9848 0.4449 0.8214 0.0721s
300d 0.1319 0.9924 0.4310 0.8683 0.0653s

With Dropout layer

Featue Vector Dimension Train Loss Train Accuracy Test Loss Test Accuracy Inference Time
50d 0.8322 0.7273 0.8891 0.7321 0.0671s
100d 0.6902 0.7955 0.7373 0.7679 0.0773s
200d 0.5082 0.9167 0.5904 0.8393 0.0997s
300d 0.2764 0.9697 0.4969 0.8750 0.0639s

You can have my dataset if you want to.