Skip to content

centre-for-humanities-computing/literary_sentiment_benchmarking

Repository files navigation

What to do:

  • install requirements

  • run main command like so:

    python -m src.sentiment_benchmarking
    --dataset-name chcaa/fiction4sentiment
    --model-names cardiffnlp/twitter-xlm-roberta-base-sentiment-multilingual
    --model-names cardiffnlp/xlm-roberta-base-sentiment-multilingual
    --model-names MiMe-MeMo/MeMo-BERT-SA
    --model-names alexandrainst/da-sentiment-base
    --model-names vesteinn/danish_sentiment
    --translate

    option to add, for testing a number of rows to run, e.g.: --n-rows 10

    option to add, for google-translating, --translate (NB: will be slow going)

    I.e.: script, model_names (can be multiple) and n-rows (optional) and translate (optional)

  • results (csv cols data & spearman results) in results folder

What it does:

  • takes (transformers-compatible) finetuned models to SA-score sentences of a HF dataset (default Fiction4-dataset); format should be ["text", "label"], where label is human gold standard
  • cleans up text (whitespace removal mainly)
  • takes the categorical scoring (positive, [neutral,] negative) and turns it continuous, using the model's assigned confidence score (see utils.py, conv_scores function) & saves results
  • computes the spearman correlation of chosen models (+precomputed* vader and roberta_xlm_base) with human gold standard ("label") both on the raw and -- [forthcoming] detrended gold standard (see utils.py)

*Note that the precomputed vader & roberta_xlm_base were applied to Google-translated sentences. For details, see our paper here -- these are the baselines to beat.

What it could do [forthcoming]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published