Description
Hi, thank you for your nice work!
I'm reproducing the results in Table 2, using Mistral-7B model on MMLU and TydiQA and select 5% data.

I adhere to the scripts in your repo to conduct the warmup, data selection and training, and use the evaluation code in your repo to evaluate. I do not change any settings in your script, though only use a random seed of 3.
Despite following these settings, the performance of my model is worse than the results in Table 2.
For MMLU, the performance of Random is 58.3 (60.0 in your paper), LESS is 60.8 (61.8 in your paper).
For TydiQA, the f1 of Random is 44.6, LESS is 55.1.
My environments are: torch 2.4.0, transformers 4.45.2, peft 0.13.1, datasets 3.0.1
Are these differences reasonable? Could you please confirm if the settings in your scripts are fully aligned with those used in your paper?
Thanks.