Add git diff reward#376
Add git diff reward#376rasdani wants to merge 3 commits intoPrimeIntellect-ai:sami/add-git-diff-rewardfrom
git diff reward#376Conversation
|
Example of one of the better outputs. Completion: With the following golden diff this completion scored It cited the correct code, but did not edit. I'm confident that a larger training run will get this right. However a completion that differs from the golden diff only in the signs of the edited lines still scores ~0.99 while being broken as a patch. I am not sure if hill-climbing from 0.99 to 1.0 will fix this or if one should try |
samsja
left a comment
There was a problem hiding this comment.
I added a small comment but this PR looks great !
|
fixed 👍 @samsja |
|
nice!! btw we will soon have direct support for |
|
@rasdani it seems that smth is off with the uv.lock change, probably because of the os you used to do it. Do you mind if I merge the PR into a branch where I can do the uv.lock myself ? (ofc the commit will still stay yours) |
|
@samsja fine with me. go ahead :) |
This PR adds a reward that computes the similarity between model generated git patches and golden patches from real-world Github PRs.
This diff-patch training is now common practice in post-training (e.g. see Cohere Sections 3.3.4.1 and 4.4 "Code Editing").
It helps with matching code conventions, comments and a more realistic style of coding and complements training against unit tests.
I trained a small scale experiment with a version of this reward with Will's


verifierrepo.https://github.com/willccbb/verifiers/blob/eb70d2cdc0b00439399e887a1e688ff6005f1354/verifiers/examples/patch.py
You can test the reward with this dataset or by running the file with this main.
prime-rl/src/zeroband/inference/genesys/git_diff.py
Line 104 in 55a30ef
For this task I deviated from string search+replace to generating patches directly as I noticed in my draft PR that the former can lead to inconsistent final file content, when edits overlap.
Patches in unified diff format can be applied to files more reliably.
I am currently re-downloading all PRs from the original SWE-Fixer dataset. When finished we should have clean patches for ~70k Github PRs.
https://huggingface.co/datasets/rasdani/github-patches