New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add MT-Bench-X to lm-evaluation-harness #104

Open

lllAlexanderlll opened this issue Dec 12, 2023 · 0 comments

Assignees

lllAlexanderlll commented Dec 12, 2023

MT-Bench is already translated, but needs to be incorporated into this framework. Some difference to existing tasks:

OpenAI API calls for single-score evaluation needed
Translated instruction prompt templates for GPT4-as-a-judge are to be used
Turn end markers should be defineable on a per model basis (this allows to early stop generation)
The categories coding, math, reasoning contain reference answers, other categories do not

lllAlexanderlll self-assigned this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment