QGEval/metrics at main · WeipingFu/QGEval

History

Name		Name	Last commit message	Last commit date
parent directory ..
BARTScore		BARTScore
GPTScore		GPTScore
QBLEU		QBLEU
QRelScore		QRelScore
QSTS		QSTS
RQUGE		RQUGE
UniEval		UniEval
bleurt		bleurt
moverscore		moverscore
.DS_Store		.DS_Store
MoverScore.py		MoverScore.py
README.md		README.md
__init__.py		__init__.py
base_evaluator.py		base_evaluator.py
coeff.py		coeff.py
geval.py		geval.py
gptscore.py		gptscore.py
metrics.py		metrics.py
prompt_gpt.py		prompt_gpt.py
qrel.py		qrel.py
qsts.py		qsts.py
test_bleurt.py		test_bleurt.py

README.md

How to use

Our codes provide the ability to evaluate automatic metrics which concludes the ability to calculate automatic metrics. Please follow these steps to calculate automatic QG metrics and evaluate automatic metrics on our benchmark.

You can use this code in two ways:

Install the package, pip install QGEval-metrics. For usage, please refer to https://pypi.org/project/QGEval-metrics/.
Download this resource, and use it following the instructions below.

Enviroment

run pip install -r requirements.txt to install the required packages.

Calculate Automatic Metrics

Prepare data

Use the data we provided at ../data/scores.xlsx, or use your own data, which should provide passages, answers, and references.

Calculate automatic metrics.

Download necessary models for each metric respectively. Model download sites can be found in each metric's link in readme of QGEval.

Update model path inside the codes. See QRelScore as an example.

# update the path of mlm_model and clm_model
def corpus_qrel(preds, contexts, device='cuda'):
    assert len(contexts) == len(preds)
    mlm_model = 'model/bert-base-cased'
    clm_model = 'model/gpt2'
    scorer = QRelScore(mlm_model=mlm_model,
            clm_model=clm_model,
            batch_size=16,
            nthreads=4,
            device=device)
    scores = scorer.compute_score_flatten(contexts, preds)
    return scores

Run python metrics.py to calculate your assigned metrics results by changing score_names in metrics.py. (data_path in each file should be changed into your own data path)

# Run QRelScore and RQUGE based on our dataset
# load data
data_path = '../data/scores.xlsx'
save_path = './result/metric_result.xlsx'
data = pd.read_excel(data_path)
hypos = data['prediction'].tolist()
refs_list = [data['reference'].tolist()]
contexts = data['passage'].tolist()
answers = data['answer'].tolist()
# scores to use
score_names = ['QRelScore', 'RQUGE']

# run metrics
res = get_metrics(hypos, refs_list, contexts, answers, score_names=score_names)

# handle results
for k, v in res.items():
    data[k] = v
print(data.columns)

# save results
data.to_excel(save_path, index=False)

or run the code file for specific metric to calculate. For example, run python qrel.py to calculate QRelScore results. The code file of each metric:

# metric: code file
BLEU/ROUGE/METEOR/BERTScore: base_evaluator.py
MoverScore: MoverScore.py
BLEURT: test_bleurt.py
Q-Metric: QBLEU/answerability_score.py
QSTS: qsts.py
BARTScore: BARTScore/bart_score.py
GPTScore: gptscore.py
UniEval: UniEval/unieval.py
QRelScore: qrel.py
RQUGE: RQUGE/rquge_score.py
GPT-zeroshot: prompt_gpt.py
G-Eval: geval.py
KDA:

Evaluate Automatic Metrics

Run python coeff.py to obtain the Pearson, Spearman, and Kendall correlation coefficient between the generated results and the labeled results. For detailed process, please refer to readme of QGEval.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metrics

metrics

README.md

How to use

Enviroment

Calculate Automatic Metrics

Evaluate Automatic Metrics

Files

metrics

Directory actions

More options

Directory actions

More options

Latest commit

History

metrics

Folders and files

parent directory

README.md

How to use

Enviroment

Calculate Automatic Metrics

Evaluate Automatic Metrics