-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
Could you please take a look at the test results for the Crypto dataset in Table 2? On Qwen-7B, the Pass@1/Pass@k scores for the Pass@1 training method are only 1.2/5.7 , while the result for the 'P@k T. + P@1 T.' method is nearly 97%. On Qwen-32B, even the standard Pass@1 training achieves a score of about 96%. This suggests that the significant performance gap may not be solely due to the method itself, but rather seems to be caused by the model reaching a sudden breakthrough or 'eureka' moment during training. I believe your method is effective, but I'm concerned that some of these dramatic improvements might be attributed to other factors, such as the Qwen model beginning to engage in longer reasoning processes.
Metadata
Metadata
Assignees
Labels
No labels