Code and data for Koo et al's ACL 2024 paper "Benchmarking Cognitive Biases in Large Language Models as Evaluators"
nlp
evaluation
bias
bias-detection
llm
llms
llm-evaluation
llms-benchmarking
llm-as-judge
llm-as-a-judge
llm-as-evaluator
-
Updated
Feb 16, 2024 - Jupyter Notebook