[feat] Integrate LightEval, HuggingFace's new LLM evaluation framework

https://github.com/huggingface/lighteval

LightEval is literally amazing -- metrics, tasks, benchmarks, everything built in!!! This will be the replacement for our feature request on adding benchmark and expanding our evaluation capabilities #60 

The design is that the user should have two ways to assess their model:

- Using the dataset it is trained on (use the test split of course), user preprocessed, compute metrics and stuff
- Use lighteval for task + benchmarks

I would say migrate from `evaluate` to `lighteval` since the former seems not as actively maintained as the latter!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[feat] Integrate LightEval, HuggingFace's new LLM evaluation framework #74

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[feat] Integrate LightEval, HuggingFace's new LLM evaluation framework #74

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions