Skip to content

Repository supporting the paper 'Exploiting survey and online patient experience comments in general practice: Validating a fine-tuned language model for automatic sentiment analysis'

Notifications You must be signed in to change notification settings

ltgoslo/Exploiting-patient-comments

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Exploiting-patient-comments

Repository supporting the paper Exploiting survey and online patient experience comments in general practice: Validating a fine-tuned language model for automatic sentiment analysis.

NorBERT3 large

You can read more about NorBERT3 large here.

Hyperparameters

Evaluated models

These are the models that were trained and evaluated on the associated splits from the same dataset. The following tables show the hyperparameters that were used during evaluation accross five runs.

GP-trained

Model Value
epochs 10
train batch size 16
eval batch size 16
seed {100, 101, 102, 103, 104}
learning rate 2e-05

NorPaC-trained

Model Value
epochs 10
train batch size 16
eval batch size 16
seed {100, 101, 102, 103, 104}
learning rate 1e-05

Model trained on all annotated data (train+dev+test)

This is the model that was tested and evaluated on the general practioner data from the Norwegian review website legelisten.no.

GP-trained

Model Value
epochs 10
train batch size 16
eval batch size 16
seed 100
learning rate 2e-05

The other hyperparameters are kept as their defaults.

Data

The data used is the NorPaC (Norwegian Patient Comment corpus) dataset, consisting of free-text comments written by patients as feedback to I. general practioners and II. special mental healthcare. For this paper, the SMH (special mental healthcare) part of the dataset is not experimented with exclusively, but is part of the full dataset in which metrics are reported for. More details about the dataset can be read about in this paper (statistics in the given paper are aggregated to sentence-level).

As the data is considered sensitive, it cannot be published, but we provide a few dummy-examples below:

[{"text": "I visited my doctor today.", "label": "neutral"}, {"text": "I am very satisfied with my GP!", "label": "positive"}, {"text": "I feel like my GP has too much work to do", "label": "negative"}, {"text": "I love my GP, but the waiting time is too long.", "label": "mixed}, . . ]

Data Preprocessing

To preserve the original characteristics of the data, we did not perform any preprocessing or cleaning step in particular, apart from removing samples that did not meet annotation requirements, such as forgotten annotations. The text was tokenized using the pretrained tokenizer for NorBERT3 large.

About

Repository supporting the paper 'Exploiting survey and online patient experience comments in general practice: Validating a fine-tuned language model for automatic sentiment analysis'

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published