You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am encountering difficulties in reproducing the experimental results on the OpenbookQA dataset. The output format is unexpected; for instance, I'm getting responses like "1 is correct. 2 is incorrect. 3 is incorrect. 4 is incorrect.", whereas the anticipated format should be "answer1". Could you please provide a detailed command or set of instructions for both fine-tuning and evaluating the model, to enable accurate reproduction of the results on OpenbookQA?
The text was updated successfully, but these errors were encountered:
I have replied to your email in case you haven't seen it.
You can refer to issue #38 and we use commonsense_evaluate.py for evaluation. BTW, please try to use single GPU for training, multi-GPU training may not reproduce the results. We are still trying to figure out the reason.
If you have further questions, please let us know!
Hi,
I am encountering difficulties in reproducing the experimental results on the OpenbookQA dataset. The output format is unexpected; for instance, I'm getting responses like "1 is correct. 2 is incorrect. 3 is incorrect. 4 is incorrect.", whereas the anticipated format should be "answer1". Could you please provide a detailed command or set of instructions for both fine-tuning and evaluating the model, to enable accurate reproduction of the results on OpenbookQA?
The text was updated successfully, but these errors were encountered: