Evaluate your LLM's response with Prometheus and GPT4 💯
-
Updated
Sep 9, 2024 - Python
Evaluate your LLM's response with Prometheus and GPT4 💯
Deliver safe & effective language models
xFinder: Robust and Pinpoint Answer Extraction for Large Language Models
This is the repo for the survey of Bias and Fairness in IR with LLMs.
Code and data for "Timo: Towards Better Temporal Reasoning for Language Models" (COLM 2024)
Code and data for Koo et al's ACL 2024 paper "Benchmarking Cognitive Biases in Large Language Models as Evaluators"
Use groq for evaluations
Antibodies for LLMs hallucinations (grouping LLM as a judge, NLI, reward models)
Add a description, image, and links to the llm-as-evaluator topic page so that developers can more easily learn about it.
To associate your repository with the llm-as-evaluator topic, visit your repo's landing page and select "manage topics."