-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Roadmap for v0.4.0 #35
Comments
Scoring, Categorization, Bar Charts split by language. |
Check determinism of models e.g. execute each plain repository X-times, and then check if they are stable. |
Save the descriptons of the models as well: https://openrouter.ai/api/v1/models The reason is that these can change over time, and we need to know after a while what they where. e.g right now i would like to know if mistral-7b-instruct for the last evaluation was v0.1. or not |
Order models by open.weight, allows commercial-use, closed, and price(!) and size: e.g. https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1 is great because open-weight, and Apache2 so commerical-use allowed. Should be better rated than GPT4 |
Write down a playbook for evaluations
|
Bar charts should have have their value on the bar. The axis values do not work that well |
Pick an example or several examples per category: goal is to find interesting results automatically, because it will get harder and harder to go manually through results. |
Do test file paths through
|
Added all follow ups to #79 so this issue is officially closed for changes. We only do the last tasks and then close it CC @bauersimon |
Finally. Eating that cake! |
The v0.4.0 is mainly meant for introducing Java to the benchmark. There are two main goals
Tasks:
The text was updated successfully, but these errors were encountered: