Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Score tasks #2452

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open

Conversation

rimashahbazyan
Copy link

  • Added SCORE: Systematic COnsistency and Robustness Evaluation for Large Language Models
  • Fixed a bug for generate until tasks to default the "until" parameter to each model's end of sequence.

@CLAassistant
Copy link

CLAassistant commented Nov 4, 2024

CLA assistant check
All committers have signed the CLA.

@baberabb
Copy link
Contributor

baberabb commented Nov 5, 2024

Hi! looks great mostly! Could you add the following changes:

  1. run the pre-commit.
pip install pre-commit
pre-commit install
pre-commit run --all-files
  1. add an entry to lm_eval/tasks/README.md describing the benchmark in 1 sentence as is done for the other entries in that table.

@rimashahbazyan
Copy link
Author

Hi! looks great mostly! Could you add the following changes:

  1. run the pre-commit.
pip install pre-commit
pre-commit install
pre-commit run --all-files
  1. add an entry to lm_eval/tasks/README.md describing the benchmark in 1 sentence as is done for the other entries in that table.

Thanks!
Done with both!

@baberabb
Copy link
Contributor

baberabb commented Nov 6, 2024

@rimashahbazyan Thanks! test failing due to a missing function in utils_agieval

AttributeError: module 'utils_agieval' has no attribute 'robustness_doc_to_text'

@rimashahbazyan
Copy link
Author

@baberabb
Sorry, fixed :)

@rimashahbazyan
Copy link
Author

@baberabb I did some small bugfixes, and double-checked everything, this PR is final; I won't commit anything to the branch, when do you think it could be reviewed?

@baberabb
Copy link
Contributor

@baberabb I did some small bugfixes, and double-checked everything, this PR is final; I won't commit anything to the branch, when do you think it could be reviewed?

Sorry for the delay, I'll try merging by next week if thats ok. Wanted to verify no other task is affected by removing the default until.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants