Skip to content

Conversation

@younesbelkada
Copy link
Contributor

What does this PR do ?

This new evaluation benchmark was submitted at the NeurIPS 2025 E2LM competition, and reached $3^{rd}$ place on the general leaderboard.

Its intended use is within the context of Small Language Model (SLM) evaluation in early training stages. More details are provided in the competition proposal paper.

Example command to get started:

lm_eval --model hf \                                                                                              
    --model_args pretrained=EleutherAI/pythia-160m,revision=step100000,dtype="float" \
    --tasks sciknoweval_mcqa \
    --device cuda:0 \
    --batch_size 8

Original authors

@DaGrapix @EricSaikali

@baberabb

Co-authored-by: Anthony Kalaydjian <[email protected]>
Co-authored-by: EricSaikali <[email protected]>
@younesbelkada younesbelkada changed the title Feat: Add team Shaikespear submission Feat: Add team Shaikespear submission from NeurIPS E2LM Competition Nov 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant