-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question regarding the source of math_10k.json #43
Comments
@HuangOwen we use the MAWPS dataset preprocessed by MWPToolkit(https://github.com/LYH-YF/MWPToolkit), which splits trainset/validset/testset on MAWPS. Here you can find the trainset https://github.com/LYH-YF/MWPToolkit/tree/master/dataset/mawps. Also you can find mawps-single here. |
I don't think you follow the split in your evaluation and math10k.json. For example, the first example in /dataset/MultiArith/test.json (which you used for testing)
could be located in the math_10k.json. Could you please further elaborate on this? |
Hi, we exactly follow the dataset split in MWPToolkit(https://github.com/LYH-YF/MWPToolkit). The example you provide can be found in the MAWPS and MAWPS-Single training set which is used to collect the fine-tuning dataset. The reason you can find the example in /dataset/MultiArith/test.json, I think it is the way the authors create the MultiArith dataset. Please let us know if you have further questions! |
Thanks for the reply, I have went through all subset of MAWPS (AddSub/MultiArith/SingleEq) and I found that all the test samples in these subset can be found in math_10k.json, while you use math_10k for the instruction fine-tuning. I think this is not reasonable. If you use the dataset split in MWPToolkit, you should not test on these specific subsets (AddSub/MultiArith/SingleEq). |
I think this data leak issue has nothing to do with the way authors create MultiArith dataset as MultiArith is proposed in 2015 and included in MAWPS in 2016, which are before the MWPToolkit is proposed. |
mark |
Hi, Many thanks for your questions! After careful double-checking, there is a data leak issue with the math reasoning experiments. We tried our best to salvage the impact of this data leak. We use the MAWPS test set to evaluate the performance of PEFT methods and the result table has been updated. The findings in the paper are still consistent. And we made a special announcement for researchers who are using our repository for their experiments. Furthermore, we also upload two variations of Sincerely apologize for any inconvenience caused by our mistake! |
Hi Zhiqiang, Thanks for your reply and your effort in fixing the problem! Glad that the dataset has been updated. |
Hi @HZQ950419, thanks for your announcement! Were the MAWPS test results shown in the table tested at https://github.com/LYH-YF/MWPToolkit/blob/master/dataset/mawps/testset.json (238 samples)? |
Hi @Yuan0320, Correct! We will upload the test set later, or you can also get the test set from MWPToolkit. |
Hi, thanks for the good work!
I have a question regarding the math_10k.json, which is used for finetuning. You mentioned in the paper that ''To enhance the diversity of our data, we incorporate the training sets from GSM8K, MAWPS, MAWPS-single'', but there is no training set for MAWPS to the best of my knowledge. When I checked the samples from math_10k.json, I found that there are some question-answer that are exactly the same as the test set of AddSub/MultiArith/SingleEq. Could you please further elaborate on this?
The text was updated successfully, but these errors were encountered: