Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MT-Bench-X to lm-evaluation-harness #104

Open
lllAlexanderlll opened this issue Dec 12, 2023 · 0 comments
Open

Add MT-Bench-X to lm-evaluation-harness #104

lllAlexanderlll opened this issue Dec 12, 2023 · 0 comments
Assignees

Comments

@lllAlexanderlll
Copy link

MT-Bench is already translated, but needs to be incorporated into this framework. Some difference to existing tasks:

  • OpenAI API calls for single-score evaluation needed
  • Translated instruction prompt templates for GPT4-as-a-judge are to be used
  • Turn end markers should be defineable on a per model basis (this allows to early stop generation)
  • The categories coding, math, reasoning contain reference answers, other categories do not
@lllAlexanderlll lllAlexanderlll self-assigned this Dec 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant