Question regarding the source of math_10k.json #43

HuangOwen · 2023-10-12T17:33:53Z

Hi, thanks for the good work!

I have a question regarding the math_10k.json, which is used for finetuning. You mentioned in the paper that ''To enhance the diversity of our data, we incorporate the training sets from GSM8K, MAWPS, MAWPS-single'', but there is no training set for MAWPS to the best of my knowledge. When I checked the samples from math_10k.json, I found that there are some question-answer that are exactly the same as the test set of AddSub/MultiArith/SingleEq. Could you please further elaborate on this?

LYH-YF · 2023-10-24T10:40:31Z

@HuangOwen we use the MAWPS dataset preprocessed by MWPToolkit(https://github.com/LYH-YF/MWPToolkit), which splits trainset/validset/testset on MAWPS. Here you can find the trainset https://github.com/LYH-YF/MWPToolkit/tree/master/dataset/mawps. Also you can find mawps-single here.

HuangOwen · 2023-10-27T03:17:39Z

@HuangOwen we use the MAWPS dataset preprocessed by MWPToolkit(https://github.com/LYH-YF/MWPToolkit), which splits trainset/validset/testset on MAWPS. Here you can find the trainset https://github.com/LYH-YF/MWPToolkit/tree/master/dataset/mawps. Also you can find mawps-single here.

I don't think you follow the split in your evaluation and math10k.json. For example, the first example in /dataset/MultiArith/test.json (which you used for testing)

{
"instruction": " At the schools book fair Sam bought 13 adventure books and 17 mystery books. If 15 of the books were used, how many new books did he buy? ",
"input": "",
"output": "\nA: Sam bought 13 adventure books and 17 mystery books. That means he bought 13 + 17 = 30 books in total. 15 of them were used, so he has 30 - 15 = 15 new books. The answer is 15.",
"answer": "15.0"
}

could be located in the math_10k.json. Could you please further elaborate on this?

HZQ950419 · 2023-10-31T08:34:01Z

@HuangOwen we use the MAWPS dataset preprocessed by MWPToolkit(https://github.com/LYH-YF/MWPToolkit), which splits trainset/validset/testset on MAWPS. Here you can find the trainset https://github.com/LYH-YF/MWPToolkit/tree/master/dataset/mawps. Also you can find mawps-single here.

I don't think you follow the split in your evaluation and math10k.json. For example, the first example in /dataset/MultiArith/test.json (which you used for testing)

{
"instruction": " At the schools book fair Sam bought 13 adventure books and 17 mystery books. If 15 of the books were used, how many new books did he buy? ",
"input": "",
"output": "\nA: Sam bought 13 adventure books and 17 mystery books. That means he bought 13 + 17 = 30 books in total. 15 of them were used, so he has 30 - 15 = 15 new books. The answer is 15.",
"answer": "15.0"
}

could be located in the math_10k.json. Could you please further elaborate on this?

Hi, we exactly follow the dataset split in MWPToolkit(https://github.com/LYH-YF/MWPToolkit). The example you provide can be found in the MAWPS and MAWPS-Single training set which is used to collect the fine-tuning dataset. The reason you can find the example in /dataset/MultiArith/test.json, I think it is the way the authors create the MultiArith dataset.

Please let us know if you have further questions!

HuangOwen · 2023-12-04T07:50:57Z

Thanks for the reply, I have went through all subset of MAWPS (AddSub/MultiArith/SingleEq) and I found that all the test samples in these subset can be found in math_10k.json, while you use math_10k for the instruction fine-tuning. I think this is not reasonable. If you use the dataset split in MWPToolkit, you should not test on these specific subsets (AddSub/MultiArith/SingleEq).

HuangOwen · 2023-12-04T07:57:15Z

@HuangOwen we use the MAWPS dataset preprocessed by MWPToolkit(https://github.com/LYH-YF/MWPToolkit), which splits trainset/validset/testset on MAWPS. Here you can find the trainset https://github.com/LYH-YF/MWPToolkit/tree/master/dataset/mawps. Also you can find mawps-single here.

I don't think you follow the split in your evaluation and math10k.json. For example, the first example in /dataset/MultiArith/test.json (which you used for testing)

{
"instruction": " At the schools book fair Sam bought 13 adventure books and 17 mystery books. If 15 of the books were used, how many new books did he buy? ",
"input": "",
"output": "\nA: Sam bought 13 adventure books and 17 mystery books. That means he bought 13 + 17 = 30 books in total. 15 of them were used, so he has 30 - 15 = 15 new books. The answer is 15.",
"answer": "15.0"
}

could be located in the math_10k.json. Could you please further elaborate on this?

Hi, we exactly follow the dataset split in MWPToolkit(https://github.com/LYH-YF/MWPToolkit). The example you provide can be found in the MAWPS and MAWPS-Single training set which is used to collect the fine-tuning dataset. The reason you can find the example in /dataset/MultiArith/test.json, I think it is the way the authors create the MultiArith dataset.

Please let us know if you have further questions!

I think this data leak issue has nothing to do with the way authors create MultiArith dataset as MultiArith is proposed in 2015 and included in MAWPS in 2016, which are before the MWPToolkit is proposed.

callanwu · 2023-12-04T15:06:34Z

mark

HZQ950419 · 2023-12-08T08:09:14Z

Thanks for the reply, I have went through all subset of MAWPS (AddSub/MultiArith/SingleEq) and I found that all the test samples in these subset can be found in math_10k.json, while you use math_10k for the instruction fine-tuning. I think this is not reasonable. If you use the dataset split in MWPToolkit, you should not test on these specific subsets (AddSub/MultiArith/SingleEq).

Hi,

Many thanks for your questions！

After careful double-checking, there is a data leak issue with the math reasoning experiments. We tried our best to salvage the impact of this data leak. We use the MAWPS test set to evaluate the performance of PEFT methods and the result table has been updated. The findings in the paper are still consistent. And we made a special announcement for researchers who are using our repository for their experiments. Furthermore, we also upload two variations of math_10k.json where the MAWPS samples are deleted.

Sincerely apologize for any inconvenience caused by our mistake!
If you have any questions, please let us know! Many thanks!

HuangOwen · 2023-12-08T09:17:56Z

Hi Zhiqiang,

Thanks for your reply and your effort in fixing the problem! Glad that the dataset has been updated.

Yuan0320 · 2023-12-10T05:17:54Z

Hi @HZQ950419, thanks for your announcement! Were the MAWPS test results shown in the table tested at https://github.com/LYH-YF/MWPToolkit/blob/master/dataset/mawps/testset.json (238 samples)?

HZQ950419 · 2023-12-10T17:18:29Z

Hi @HZQ950419, thanks for your announcement! Were the MAWPS test results shown in the table tested at https://github.com/LYH-YF/MWPToolkit/blob/master/dataset/mawps/testset.json (238 samples)?

Hi @Yuan0320,

Correct! We will upload the test set later, or you can also get the test set from MWPToolkit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question regarding the source of math_10k.json #43

Question regarding the source of math_10k.json #43

HuangOwen commented Oct 12, 2023

LYH-YF commented Oct 24, 2023

HuangOwen commented Oct 27, 2023

HZQ950419 commented Oct 31, 2023

HuangOwen commented Dec 4, 2023 •

edited

Loading

HuangOwen commented Dec 4, 2023

callanwu commented Dec 4, 2023

HZQ950419 commented Dec 8, 2023

HuangOwen commented Dec 8, 2023

Yuan0320 commented Dec 10, 2023 •

edited

Loading

HZQ950419 commented Dec 10, 2023

Question regarding the source of math_10k.json #43

Question regarding the source of math_10k.json #43

Comments

HuangOwen commented Oct 12, 2023

LYH-YF commented Oct 24, 2023

HuangOwen commented Oct 27, 2023

HZQ950419 commented Oct 31, 2023

HuangOwen commented Dec 4, 2023 • edited Loading

HuangOwen commented Dec 4, 2023

callanwu commented Dec 4, 2023

HZQ950419 commented Dec 8, 2023

HuangOwen commented Dec 8, 2023

Yuan0320 commented Dec 10, 2023 • edited Loading

HZQ950419 commented Dec 10, 2023

HuangOwen commented Dec 4, 2023 •

edited

Loading

Yuan0320 commented Dec 10, 2023 •

edited

Loading