llm-as-evaluator

Star

Here are 8 public repositories matching this topic...

prometheus-eval / prometheus-eval

Star

Evaluate your LLM's response with Prometheus and GPT4 💯

python evaluation gpt4 llm llmops vllm litellm llm-as-a-judge llm-as-evaluator

Updated Sep 9, 2024
Python

JohnSnowLabs / langtest

Star

Deliver safe & effective language models

nlp artificial-intelligence benchmarks benchmark-framework model-assessment ai-safety mlops responsible-ai ml-safety trustworthy-ai ethics-in-ai ml-testing large-language-models llm ai-testing llm-test llm-evaluation-toolkit llm-as-evaluator llm-testing

Updated Nov 12, 2024
Python

IAAR-Shanghai / xFinder

Star

xFinder: Robust and Pinpoint Answer Extraction for Large Language Models

benchmark regex reliability evaluation dataset gpt phi large-language-models llm open-compass chatglm qwen lm-evaluation llm-as-a-judge llm-as-evaluator xfinder reliable-evaluation key-answer-extraction judge-model

Updated Oct 28, 2024
Python

KID-22 / LLM-IR-Bias-Fairness-Survey

Star

This is the repo for the survey of Bias and Fairness in IR with LLMs.

information-retrieval recommender-systems bias ir fairness large-language-models llm chatgpt llm4rec llm4rs llm-as-a-judge llm-as-evaluator llm4ir

Updated Oct 29, 2024

zhaochen0110 / Timo

Star

Code and data for "Timo: Towards Better Temporal Reasoning for Language Models" (COLM 2024)

temporal-reasoning sota-model llms rlhf rlaif llm-as-a-judge llm-as-evaluator self-critic-framework colm2024

Updated Oct 23, 2024
Python

minnesotanlp / cobbler

Star

Code and data for Koo et al's ACL 2024 paper "Benchmarking Cognitive Biases in Large Language Models as Evaluators"

nlp evaluation bias bias-detection llm llms llm-evaluation llms-benchmarking llm-as-judge llm-as-a-judge llm-as-evaluator

Updated Feb 16, 2024
Jupyter Notebook

djokester / groqeval

Star

Use groq for evaluations

groq llm generative-ai mixtral llm-as-a-judge llm-as-evaluator llama3

Updated Jul 11, 2024
Python

rafaelsandroni / antibodies

Star

Antibodies for LLMs hallucinations (grouping LLM as a judge, NLI, reward models)

python nli hallucinations llms hallucination-detection llm-as-a-judge llm-as-evaluator

Updated Jun 13, 2024
Python

Improve this page

Add a description, image, and links to the llm-as-evaluator topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-as-evaluator topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-as-evaluator

Here are 8 public repositories matching this topic...

prometheus-eval / prometheus-eval

JohnSnowLabs / langtest

IAAR-Shanghai / xFinder

KID-22 / LLM-IR-Bias-Fairness-Survey

zhaochen0110 / Timo

minnesotanlp / cobbler

djokester / groqeval

rafaelsandroni / antibodies

Improve this page

Add this topic to your repo