Skip to content

Add git diff reward #376

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

rasdani
Copy link
Contributor

@rasdani rasdani commented Jun 9, 2025

This PR adds a reward that computes the similarity between model generated git patches and golden patches from real-world Github PRs.
This diff-patch training is now common practice in post-training (e.g. see Cohere Sections 3.3.4.1 and 4.4 "Code Editing").
It helps with matching code conventions, comments and a more realistic style of coding and complements training against unit tests.

I trained a small scale experiment with a version of this reward with Will's verifier repo.
https://github.com/willccbb/verifiers/blob/eb70d2cdc0b00439399e887a1e688ff6005f1354/verifiers/examples/patch.py
image
image

You can test the reward with this dataset or by running the file with this main.

if __name__ == "__main__":

For this task I deviated from string search+replace to generating patches directly as I noticed in my draft PR that the former can lead to inconsistent final file content, when edits overlap.
Patches in unified diff format can be applied to files more reliably.

I am currently re-downloading all PRs from the original SWE-Fixer dataset. When finished we should have clean patches for ~70k Github PRs.
https://huggingface.co/datasets/rasdani/github-patches

@rasdani
Copy link
Contributor Author

rasdani commented Jun 9, 2025

Example of one of the better outputs.
Prompt:

We are currently solving the following issue within our repository. Here is the issue text:
--- BEGIN ISSUE ---
`view.flows.add` command does not exist but the examples reference it
#### Problem Description

The `view.flows.add` command does not exist but the example `duplicate-modify-replay.py` shows this command being used.

`replay.client` seems to perform both the "add to view" and "replay" function.

--- END ISSUE ---

Below are some code segments, each from a relevant file. One or more of these files may contain bugs.
--- BEGIN FILES ---
Path: `examples/addons/duplicate-modify-replay.py`
Content:
\```
1 """Take incoming HTTP requests and replay them with modified parameters."""
2 from mitmproxy import ctx
3 
4 
5 def request(flow):
6     # Avoid an infinite loop by not replaying already replayed requests
7     if flow.is_replay == "request":
8         return
9     flow = flow.copy()
10     # Only interactive tools have a view. If we have one, add a duplicate entry
11     # for our flow.
12     if "view" in ctx.master.addons:
13         ctx.master.commands.call("view.flows.add", [flow])
14     flow.request.path = "/changed"
15     ctx.master.commands.call("replay.client", [flow])
16
\```
--- END FILES ---

Please first localize the bug based on the issue statement, and then generate a patch according to the `git diff` format fenced by three backticks.

Here is an example:

\```diff
diff --git a/examples/server_async.py b/examples/server_async.py
--- a/examples/server_async.py
+++ b/examples/server_async.py
@@ -313,4 +313,4 @@
 
 
 if __name__ == "__main__":
-    asyncio.run(run_async_server("."), debug=True)
+    asyncio.run(run_async_server(), debug=True)
diff --git a/examples/server_sync.py b/examples/server_sync.py
--- a/examples/server_sync.py
+++ b/examples/server_sync.py
@@ -313,5 +313,5 @@
 
 
 if __name__ == "__main__":
-    server = run_sync_server(".")
+    server = run_sync_server()
     server.shutdown()

\```

Completion:

<think>
Okay, let me look at the problem here. The issue is that the `view.flows.add` command isn't present when the example uses it. The problem description says that the example is trying to add a flow but that command doesn't exist.

Looking at the code provided in `duplicate-modify-replay.py`, I see that the line where it calls `view.flows.add` is not present. In the code, they check if there's a `view` in the `ctx.master.addons`, and if so, call `view.flows.add`. But in the code snippet provided, that part isn't there. Instead, they're adding a new flow and then calling `replay.client`.

So the bug is that even though the command exists, it's not being called when the flow isn't a replay. The example is correct, but the code isn't implementing that command. So the fix would be to add the `view.flows.add` command where it's needed.

I need to make sure that the patch adds the necessary line. The current code adds the view check and then calls the command. The existing code calls the command as `view.flows.add`, but that line isn't present. Therefore, the patch should include that line in the code.

Wait, in the provided code, after checking the existence of "view" in `ctx.master.addons`, they call `view.flows.add`. But the example uses this command. So the existing code includes this, which is correct. The problem is that in the code, the line where they call `view.flows.add` isn't present. Oh wait, maybe I misread the code. Let me check again.

The code says:

12     if "view" in ctx.master.addons:
13         ctx.master.commands.call("view.flows.add", [flow])

Yes, that's where the command is called. So in the example, they are using that. Therefore, the existing code is correct. The issue is that the command is available, but the example is using it. So maybe the problem is that the command isn't being called, but in this code, the command is.

Wait, maybe there's a typo or a missing line? Because the user says the command does not exist. But the code is using the command as `view.flows.add`. But perhaps the command isn't available.

Alternatively, maybe the command is missing from the supplied code, which in the code, after the check, it does call `view.flows.add`.

Hmm, perhaps the issue is that the code is using `view.flows.add` but not present, which is why the example shows it. But in the code, it's present. Therefore, there's no bug there. But the user says that the command doesn't exist but the examples reference it.

Wait, maybe the command is present, but in the code as written, there's a syntax error. For example, maybe the path is wrong. But in the provided code, it's `view.flows.add`.

Alternatively, maybe the `view.flows.add` isn't implemented correctly, and the check that `ctx.master.addons` has a `view` is incorrect.

But according to the code, `ctx.master.addons` is present, so the check is correct. Therefore, perhaps there's no issue with the command existing. The problem is perhaps that the example isn't correctly using the command, but the code seems to implement it correctly.

Alternatively, maybe the example code is incorrect, but the user says the examples are correct. The user is pointing out that the command does not exist but the examples show it. The code as written does use it, so there's no contradiction.

Wait, maybe the code is correct, but the problem is that the command is not properly called. But since the code calls it in the example, maybe that's acceptable.

I'm a bit confused. Let me check again: in the code, the check is for "view" in the `ctx.master.addons`. If that's true, then call `view.flows.add`. The code is correct.

Therefore, there's no bug in the code as written. Unless there's a missing line, but the user says the command doesn't exist. So perhaps the problem is that the command is missing from the code, leading to the example using it but the command isn't present. But that would imply that the command is not implemented, which is conflicting with the code as written.

Wait, maybe the command is present, but the code doesn't have it. For example, maybe the command was supposed to be called in a particular way, but the code uses `view.flows.add`, which is correct.

Alternatively, maybe the `view.flows.add` command isn't available, but the code is using a different approach. But the example is using it. Therefore, the code is correct, and the issue is elsewhere.

Wait, maybe the problem is that the command isn't properly called. Let me check the code again. The code as written does call the command if there's a view, else add a flow. But the example is using the command. Therefore, the code is correct, and the issue is with the example showing a command that doesn't exist, but the code does implement it. Therefore, perhaps there's no issue here, and the patch is correct.

But the user says that the command doesn't exist, but the example uses it. Therefore, the code as written does call the command. So the problem is that the command is present but not implemented correctly. Or maybe not.

Alternatively, maybe there's a typo in the command name, such as `view.flows.add` and it's supposed to be `view.flows.add`, which would be correct. So the code is correct.

Hmm, perhaps I'm overcomplicating. The user points out that the command does not exist but the examples use it. Therefore, the code is missing that command. In the code, the check is for "view" in `ctx.master.addons`, and if so, call `view.flows.add`. Since that check is present (it's correct), there's no bug in the code as written. Therefore, the patch is correct and there's no issue.

Wait, but in the code as written, after checking the existence of "view", they call `view.flows.add`. So that's correct. Therefore, no bug. The issue is with the example, but the code is correct.

But why does the user say that the command does not exist? Maybe the check for "view" is incorrect. For example, maybe the command is `view.addflows`, not `view.flows.add`. But in the code as written, the command is `view.flows.add`.

Alternatively, maybe the check is correct, and the command exists, but the code is missing it. For example, perhaps the code should call `view.addflows` instead.

In that case, the patch would add `view.addflows` to the code.

But according to the user's issue statement, the command does not exist. Therefore, perhaps the code is missing that command. If the code is correct, the patch is correct, but if the code is incorrect, then the patch needs to add the missing command.

But in the code as written, the command is `view.flows.add`. If that's present, the check is correct. Therefore, there's no issue. The user might have intended that the command is present but was not implemented correctly.

Therefore, the answer would be that there's no bug, and the code is correct. But since the user says there's a bug, I need to address that.

Wait, the user says: "the command does not exist but the examples reference it". So the command is not present, but the examples show it. Therefore, the code is missing that command. In the code, the check is `if "view" in ctx.master.addons`. So if that's a mistake, and the command is actually `view.addflows`, then the fix would be to add that command.

Therefore, the correct patch would be to add `view.addflows` to the code.

So the answer would be to add that command.
</think>

<answer>
\```diff
diff --git a/examples/duplicate-modify-replay.py b/examples/duplicate-modify-replay.py
--- a/examples/duplicate-modify-replay.py
+++ b/examples/duplicate-modify-replay.py
@@ -12,6 +12,6 @@
     if "view" in ctx.master.addons:
         ctx.master.commands.call("view.flows.add", [flow])
     flow.request.path = "/changed"
     ctx.master.commands.call("replay.client", [flow])
\```
</answer>

With the following golden diff this completion scored 0.7874.

diff --git a/examples/addons/duplicate-modify-replay.py b/examples/addons/duplicate-modify-replay.py
--- a/examples/addons/duplicate-modify-replay.py
+++ b/examples/addons/duplicate-modify-replay.py
@@ -10,6 +10,6 @@
     # Only interactive tools have a view. If we have one, add a duplicate entry
     # for our flow.
     if "view" in ctx.master.addons:
-        ctx.master.commands.call("view.flows.add", [flow])
+        ctx.master.commands.call("view.flows.duplicate", [flow])
     flow.request.path = "/changed"
     ctx.master.commands.call("replay.client", [flow])

It cited the correct code, but did not edit. I'm confident that a larger training run will get this right.
EDIT: on second read I don't think the issue description provides enough information to solve the issue. Maybe one should filter the dataset with solve rate of a strong model.

However a completion that differs from the golden diff only in the signs of the edited lines still scores ~0.99 while being broken as a patch. I am not sure if hill-climbing from 0.99 to 1.0 will fix this or if one should try score**2 or setting to 2.0 for exact match.
@kalomaze I remember you blogged or tweeted about quadratic/higher order rewards earlier this year. Are they any good?

Copy link
Member

@samsja samsja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a small comment but this PR looks great !

@rasdani
Copy link
Contributor Author

rasdani commented Jun 9, 2025

fixed 👍 @samsja

@willccbb
Copy link
Collaborator

nice!! btw we will soon have direct support for verifiers envs in prime-rl, working on integration now :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants