Our codes provide the ability to evaluate automatic metrics
which concludes the ability to calculate automatic metrics
. Please follow these steps to calculate automatic QG metrics and evaluate automatic metrics on our benchmark.
You can use this code in two ways:
- Install the package,
pip install QGEval-metrics
. For usage, please refer to https://pypi.org/project/QGEval-metrics/. - Download this resource, and use it following the instructions below.
run pip install -r requirements.txt
to install the required packages.
Prepare data
Use the data we provided at ../data/scores.xlsx, or use your own data, which should provide passages, answers, and references.
Calculate automatic metrics.
Download necessary models for each metric respectively. Model download sites can be found in each metric's link in readme of QGEval.
Update model path inside the codes. See
as an example.# update the path of mlm_model and clm_model def corpus_qrel(preds, contexts, device='cuda'): assert len(contexts) == len(preds) mlm_model = 'model/bert-base-cased' clm_model = 'model/gpt2' scorer = QRelScore(mlm_model=mlm_model, clm_model=clm_model, batch_size=16, nthreads=4, device=device) scores = scorer.compute_score_flatten(contexts, preds) return scores
python metrics.py
to calculate your assigned metrics results by changingscore_names
. (data_path
in each file should be changed into your own data path)# Run QRelScore and RQUGE based on our dataset # load data data_path = '../data/scores.xlsx' save_path = './result/metric_result.xlsx' data = pd.read_excel(data_path) hypos = data['prediction'].tolist() refs_list = [data['reference'].tolist()] contexts = data['passage'].tolist() answers = data['answer'].tolist() # scores to use score_names = ['QRelScore', 'RQUGE'] # run metrics res = get_metrics(hypos, refs_list, contexts, answers, score_names=score_names) # handle results for k, v in res.items(): data[k] = v print(data.columns) # save results data.to_excel(save_path, index=False)
or run the code file for specific metric to calculate. For example, run
python qrel.py
to calculate QRelScore results. The code file of each metric:# metric: code file BLEU/ROUGE/METEOR/BERTScore: base_evaluator.py MoverScore: MoverScore.py BLEURT: test_bleurt.py Q-Metric: QBLEU/answerability_score.py QSTS: qsts.py BARTScore: BARTScore/bart_score.py GPTScore: gptscore.py UniEval: UniEval/unieval.py QRelScore: qrel.py RQUGE: RQUGE/rquge_score.py GPT-zeroshot: prompt_gpt.py G-Eval: geval.py KDA:
Run python coeff.py
to obtain the Pearson, Spearman, and Kendall correlation coefficient between the generated results and the labeled results. For detailed process, please refer to readme of QGEval.