Is "Pass@K" training optimizing for "Pass@K" testing?

I saw this paper previously, and it uses "CoT-Pass@K" metric to demonstrate that RLVR is not really flawed in implicit reasoning... except in that instance, it was trained with the stereotypical "Pass@1" instead of "Pass@K". So, can "Pass@K" training demonstrate good "CoT-Pass@K" results? https://arxiv.org/html/2506.14245v1

Bonus question:
- Can "Pass@K" training be hacked such that it does not need to bounce around with "Pass@1" training at the end?
- Would adaptive consistency be useful as a further enhancement? https://arxiv.org/abs/2305.11860

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is "Pass@K" training optimizing for "Pass@K" testing? #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Is "Pass@K" training optimizing for "Pass@K" testing? #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions