Skip to content

Commit

Permalink
add labr arabic task
Browse files Browse the repository at this point in the history
  • Loading branch information
issamYahiaoui committed Mar 26, 2024
1 parent 1dc5cae commit 9c79834
Show file tree
Hide file tree
Showing 3 changed files with 97 additions and 0 deletions.
50 changes: 50 additions & 0 deletions lm_eval/tasks/arabic_tasks/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# ArSentimentAnalysisLabr

### Paper

Title: `LABR: A Large Scale Arabic Book Reviews Dataset`

Abstract: `https://aclanthology.org/P13-2088.pdf`

`
This dataset contains over 63,000 book reviews in Arabic. It is the largest sentiment analysis dataset for Arabic to-date. The book reviews were harvested from the website Goodreads during the month or March 2013. Each book review comes with the goodreads review id, the user id, the book id, the rating (1 to 5) and the text of the review.`

Homepage: `https://aclanthology.org/P13-2088.pdf`


### Citation

```
@inproceedings{aly2013labr,
title={Labr: A large scale arabic book reviews dataset},
author={Aly, Mohamed and Atiya, Amir},
booktitle={Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
pages={494--498},
year={2013}
}
```

### Groups and Tasks

#### Groups

* `group_name`: `Short description`

#### Tasks

* `task_name`: `1-sentence description of what this particular task does`
* `task_name2`: ...

### Checklist

For adding novel benchmarks/datasets to the library:
* [ ] Is the task an existing benchmark in the literature?
* [ ] Have you referenced the original paper that introduced the task?
* [ ] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?


If other tasks on this dataset are already supported:
* [ ] Is the "Main" variant of this task clearly denoted?
* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?
16 changes: 16 additions & 0 deletions lm_eval/tasks/arabic_tasks/ar_sentiment_analysis_labr.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
task: ar_sentiment_analysis_labr
dataset_path: labr
output_type: multiple_choice
training_split: train
test_split: test
doc_to_text: !function preprocess_ar_sentiment_analysis_labr.doc_to_text
doc_to_target: !function preprocess_ar_sentiment_analysis_labr.doc_to_target
process_docs: !function preprocess_ar_sentiment_analysis_labr.process_docs
doc_to_choice: ['1','2','3','4', '5']
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
- metric: acc_norm
aggregation: mean
higher_is_better: true
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
import datasets


def process_docs(dataset: datasets.Dataset):

choices =['1','2','3','4', '5']


def _helper(doc):
print(doc["label"])
# modifies the contents of a single
# document in our dataset.
doc["query"]= doc["text"] # The query prompt.
doc["choices"] = choices
doc["gold"] = doc["label"]
return doc

return dataset.map(_helper) # returns back a datasets.Dataset object

def doc_to_text(doc) -> str:
print(doc)
return (
"You are a highly intelligent Arabic speaker who analyze the following texts and answers with the sentiment analysis.\nOnly write the answer down."
+ "\n\n**Text:**" + doc["query"] + "\n\n"
+ ",".join(doc['choices'])
+ "\n\n**Answer:**"
)


def doc_to_target(doc) -> int:
return " " + doc["choices"][doc["gold"]] + "\n\n"

0 comments on commit 9c79834

Please sign in to comment.