Skip to content

Releases: EleutherAI/lm-evaluation-harness

v0.4.1

31 Jan 15:29
a0a2fec

Choose a tag to compare

Release Notes

This PR release contains all changes so far since the release of v0.4.0 , and is partially a test of our release automation, provided by @anjor .

At a high level, some of the changes include:

  • Data-parallel inference using vLLM (contributed by @baberabb )
  • A major fix to Huggingface model generation--previously, in v0.4.0, due to a bug with stop sequence handling, generations were sometimes cut off too early.
  • Miscellaneous documentation updates
  • A number of new tasks, and bugfixes to old tasks!
  • The support for OpenAI-like API models using local-completions or local-chat-completions ( Thanks to @veekaybee @mgoin @anjor and others on this)!
  • Integration with tools for visualization of results, such as with Zeno, and WandB coming soon!

More frequent (minor) version releases may be done in the future, to make it easier for PyPI users!

We're very pleased by the uptick in interest in LM Evaluation Harness recently, and we hope to continue to improve the library as time goes on. We're grateful to everyone who's contributed, and are excited by how many new contributors this version brings! If you have feedback for us, or would like to help out developing the library, please let us know.

In the next version release, we hope to include

  • Chat Templating + System Prompt support, for locally-run models
  • Improved Answer Extraction for many generative tasks, making them more easily run zero-shot and less dependent on model output formatting
  • General speedups and QoL fixes to the non-inference portions of LM-Evaluation-Harness, including drastically reduced startup times / faster non-inference processing steps especially when num_fewshot is large!
  • A new TaskManager object and the deprecation of lm_eval.tasks.initialize_tasks(), for achieving the easier registration of many tasks and configuration of new groups of tasks

What's Changed

Read more

v0.4.0

04 Dec 15:08
c9bbec6

Choose a tag to compare

What's Changed

Read more

v0.3.0

08 Dec 08:34

Choose a tag to compare

HuggingFace Datasets Integration

This release integrates HuggingFace datasets as the core dataset management interface, removing previous custom downloaders.

What's Changed

New Contributors

Full Changelog: v0.2.0...v0.3.0

v0.2.0

07 Mar 02:12

Choose a tag to compare

Major changes since 0.1.0:

  • added blimp (#237)
  • added qasper (#264)
  • added asdiv (#244)
  • added truthfulqa (#219)
  • added gsm (#260)
  • implemented description dict and deprecated provide_description (#226)
  • new --check_integrity flag to run integrity unit tests at eval time (#290)
  • positional arguments to evaluate and simple_evaluate are now deprecated
  • _CITATION attribute on task modules (#292)
  • lots of bug fixes and task fixes (always remember to report task versions for comparability!)

v0.0.1

02 Sep 02:28

Choose a tag to compare

Rename package