Skip to content

feat: patch application without git #1676

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jun 23, 2025

Conversation

remimimimimi
Copy link
Contributor

This PR adds a pure Rust patch application using diffy (fork with changes is here) and includes test for sources generated from conda-forge recipes. It also adds a new patch-apply-gen-test-data binary to harvest recipe.yaml and corresponding patch files from conda-forge linux-64 repodata into a local ./recipes directory for testing.

Initially tests data was planned to be generated from converted meta.yaml, but reality showed that conda-recipe-manager can not automatically convert most of the packages. From my trial I saw only <5 packages working, out of 17k available.

Current state is draft, until some clean up is finished.

@remimimimimi remimimimimi force-pushed the new-diff-handling-without-git branch 2 times, most recently from 087283f to 3c8f839 Compare May 19, 2025 20:13
@remimimimimi
Copy link
Contributor Author

I still need to figure out how to fix CI errors, but otherwise PR is ready for review. There is two kinds of errors currently, either patch application on windows, or OpenSSL version errors. I especially concerned about the latter ones as they cause most of the CI failures. Would love to hear to approach solving them.

Should we store all this test data in the repository? I did this because, on my machine, scan of conda-forge takes around an hour (43 minutes) to complete. Maybe there is a faster way to get all necessary data?

@remimimimimi remimimimimi marked this pull request as ready for review May 19, 2025 20:46
const OUTPUT_PATH: &str = "test-data/conda_forge/recipes/";

// Overview:
// 1. Get repodata for conda-forge linux-64.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting approach! I think I would have checked if Github has some API and tried to extract them from the feedstocks. But this obviously works nicely as well!

Copy link
Contributor Author

@remimimimimi remimimimimi May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this initially, but hit REST API rate limit for free usage. Other option could be to use GraphQL. I haven't investigated it thoroughly yet, but it seems that it will hit rate limit as well.

@wolfv
Copy link
Member

wolfv commented May 20, 2025

Hey @remimimimimi - excellent PR and also impressive work in the diffy crate.

The openssl errors are indeed a bit puzzling because we don't have them on main, and I am not sure how diffy would be part of this.

Although maybe you are building an additional executable in CI? Maybe we need to enable rustls-tls for reqwest?

Regarding the test data ... I think it might be better to start a second repository with the test data that we checkout during a CI run. But we can take care of that once we fixed the tests IMO.

We also did indeed have some breakages with the git apply method which were recently fixed. Unfortunately that means your branch is out of sync with main. Would you be able to rebase your changes to patch.rs?

@tdejager
Copy link
Contributor

@remimimimimi How I mostly diagnose the openssl errors (which can be caused by a cargo feature mismatch) is by using cargo tree -i openssl-sys or the offending library, and look from there. Then you should kind of see what is enabling that library, it can even be another transient dependency causing the issue.

@remimimimimi remimimimimi force-pushed the new-diff-handling-without-git branch from 3c8f839 to 56afd9d Compare May 20, 2025 21:43
@remimimimimi remimimimimi force-pushed the new-diff-handling-without-git branch from 56afd9d to 17191e3 Compare May 21, 2025 15:43
@remimimimimi remimimimimi requested a review from wolfv May 21, 2025 16:16
@remimimimimi remimimimimi force-pushed the new-diff-handling-without-git branch 2 times, most recently from df4c96d to c5cc6be Compare May 22, 2025 17:42
Introduces pure rust option to apply patches.

This commit also adds tests that are automatically generated from
conda-forge package recipes that use recipe.yaml and contain at least
one applicable patch.
@wolfv
Copy link
Member

wolfv commented Jun 1, 2025

I played around with this code today. A few notes:

  • The tests shoudl fail if diffy patch application fails (even if git also fails). Our git application is not perfect, but everything that is on conda-forge already should ideally also pass.
  • I encountered one issue when testing today where diffy does not match context fuzzily. I prepared a PR here: make patch application fuzzy diffy#1 . Thanks to this I also learned that git apparently does not have fuzzy logic when applying patches, so it currently breaks while GNU patch can deal with this well. I would really like to get rid of git. Original issue: Failure to apply patch regression in rattler-build 0.42.1 #1701
  • I encountered another issue (unrelated to diffy) where we parse filenames wrongly and include a timestamp in the filename (Failure of patching #1690)
  • Sometimes recipes are not parsed properly because we are missing some "variants" like python_min. In this case the test code just skips over the recipe. I think we should error in this case and fix up the recipes (remove logic we don't need) until it passes in the test data. I already pushed some changes like that to our new repository.

Surprisingly it is not enough to just strip some prefix from the patch
paths, sometimes you also have to find shortest possible matching
subpath from the working directory.

We fix that, and correspondingly change test to check only if diffy
succeedes as git is broken here and there.

There are still 4 test cases where git and diffy give different
results, but that is due to git incorrect behavior. If you look at the
directory diff output and patch files you'll see that diffy applies
patches correctly and git silently fails. Comment is added with exact
recipes that fail, so look at it if you want to check it out yourself.

There are also couple of packages that still fail and we ignore
them (they include mumps substring in their name) since their format
is looking strange, but it is possible to easily fix this issue in
practice. Final decision is postponed, since further investigation is
needed on why it even works in conda.  Following is the header of one
of the patches in mumps recipe that fails.

```diff
--- mumps/src/mumps_common_orig.h
+++ mumps/src/mumps_common.h
```

In actual directory there is no file with `_orig` postfix, therefore
we fail to find it.

In total there is 6 different test cases that fail and we ignore it
due to the reasons described above.
@remimimimimi remimimimimi force-pushed the new-diff-handling-without-git branch from f5e787e to 971a41b Compare June 4, 2025 20:01
@wolfv
Copy link
Member

wolfv commented Jun 13, 2025

Can we run the big test only when we have a certain github label, for example? They are pretty slow, I assume (downloading lots of test data)

@remimimimimi remimimimimi force-pushed the new-diff-handling-without-git branch 2 times, most recently from 134f243 to 674ff26 Compare June 13, 2025 13:14
@remimimimimi remimimimimi force-pushed the new-diff-handling-without-git branch 15 times, most recently from 559fb8e to 55b4f9b Compare June 18, 2025 16:57
fn patches_from_str(input: &str) -> Result<Vec<Patch<'_, str>>, diffy::ParsePatchError> {
patches_from_str_with_config(
fn patches_from_bytes(input: &[u8]) -> Result<Vec<Patch<'_, [u8]>>, diffy::ParsePatchError> {
diffy::patches_from_bytes_with_config(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to use the bytes configuration here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, some of the files that are patched are not valid utf8 strings, so byte based parsing is required. This shouldn't affect errors messages that user sees, as patch application errors are currently non-existent, and parsing errors give only name of the issue, but not position. After new parser will be implemented I plan to try show string based representation for errors whenever possible.

@@ -622,6 +559,12 @@ mod tests {
#[exclude("mumps")]
// Failed to download source
#[exclude("petsc")]
// GNU patch fails and diffy succeeds, seemingly correctly from the diff output.
#[exclude("(fastjet-cxx)|(fenics-)|(love2d)|(flask-security-too)")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to at least test love2d as it was the one we had trouble with for CLRF.

We don't have to run it against Gnu Patch - just add some kinda snapshot test or assert that the CMakeLists file changed in the way we expect.

# Needed to use GNU patch instead of Strawberry Perl patch
if: ${{ contains(github.event.pull_request.labels.*.name, 'need-patch-apply-tests') && matrix.os == 'windows-latest' }}
run: Add-Content $env:GITHUB_PATH "C:\Program Files\Git\usr\bin"
- name: Set up GNU patch on MacOS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if yes, I would prefer to use pixi to install it, of course :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to have that for consistent behavior between target operating systems. Will try pixi for MacOS.

@@ -38,12 +38,26 @@ jobs:
toolchain: "1.86.0"
components: clippy,rustfmt
- uses: Swatinem/rust-cache@9d47c6ad4b02e050fd481d890b2ea34778fd09d6 # v2
# https://github.com/actions/runner-images/issues/5459#issuecomment-1532856844
- name: Set up GNU patch on Windows
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to have that for consistent behavior between target operating systems.

@remimimimimi remimimimimi force-pushed the new-diff-handling-without-git branch 2 times, most recently from 23cb6a4 to 848148e Compare June 19, 2025 08:51
@remimimimimi remimimimimi force-pushed the new-diff-handling-without-git branch 3 times, most recently from 4d56ade to 988d448 Compare June 20, 2025 12:19
@remimimimimi remimimimimi force-pushed the new-diff-handling-without-git branch from 988d448 to 32d7df6 Compare June 20, 2025 12:42
@remimimimimi remimimimimi requested a review from wolfv June 23, 2025 07:18
@wolfv wolfv merged commit 18ed0b9 into prefix-dev:main Jun 23, 2025
35 of 37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need-patch-apply-tests Run slow patch apply tests in ci/cd
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants