Skip to content

opam update: load only changed opam files #6614

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

arozovyk
Copy link
Collaborator

@arozovyk arozovyk commented Jul 22, 2025

This PR leverages the fact that we already compute a diff/patch during update to determine exactly which files have changed and it what way. Instead of reloading the entire repository we:

  • Propagate Patch.t info through the update pipeline - from OpamRepositoryBackend.get_diff to OpamRepository.update
  • Add incremental loading via OpamRepositoryState.load_opams_incremental
  • Maintain existing opam files that haven't changed, avoiding unnecessary I/O

On Windows the OpamRepositoryState.load_opams_from_dir in OpamUpdate.repository currently takes ~10s. This PR brings it down to ~0.01s (the time it takes to load the opam files of a usual repository change).
On unix, it is a bit less dramatic, but still going from ~3.5s to 1ms

closes #5824

@kit-ty-kate kit-ty-kate added this to the 2.5.0~alpha1 milestone Jul 22, 2025
Comment on lines 177 to 180
| Patch.Edit (file, _) -> file
| Patch.Delete file -> file
| Patch.Create file -> file
| Patch.Git_ext (file, _, _) -> file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you want to keep the whole diff.Patch.operation around. Otherwise you won't know when a file has been moved

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this also warrants a test. I don't think we have a test for files being moved around, empty files being deleted and empty files being created in a git repository.

Copy link
Collaborator Author

@arozovyk arozovyk Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept the diff.Patch.operation instead of a file. It seems to be working fine for HTTP/local backends as you can see in reftest i added.

As for git repository test, iI tested it here 071a04d
There are some things to be addresed: first we have to parse the patch file produced by git diff (OpamGit.VCS.diff). I also ran into the fact that only Git_ext.Rename_only is properly recognized, while git creation and deletion fallback to Patch.Create / Patch.Delete instead of Git_ext.Delete_only / Git_ext.Create_only which is fine since it's the same action in the end, but still quite confusing. (the reason seems to be patch ignoring the git headers when ---/ +++ are present)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first we have to parse the patch file produced by git diff

this should already be done. The parse is done in OpamRepository.apply_repo_update for all backends, it would be nice to also avoid duplication with data we already have in the case of the HTTP/local backend.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty files being deleted and empty files being created

Not sure I understand right. What would be the expected behaviour ? New empty opam files will fail on read, thus will be removed. As for deletion - is there a situation where we can end up with an empty opam file in the repo?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parse is done in done in OpamRepository.apply_repo_update... avoid duplication in the case of the HTTP/local backend.

In OpamRepositoryBackend the patch file is created after the we compute Patch.diff for each file (diffs that I currently reuse). And for VCS it's the other way around (Patch.parse produces the diffs). So I feel like I should reuse the the info from OpamRepository.apply_repo_update (by propagating it from OpamSystem.patch) and keep the HTTP/local backends as is.

Copy link
Collaborator Author

@arozovyk arozovyk Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid duplication

Hmm, looked at it a bit more. I think I understand better what you mean for HTTP/local, we currently compute it fetch_repo_update then parse it again in apply_repo_update

@arozovyk arozovyk force-pushed the update_read_opams_incremental branch from 9609f5b to 878ac75 Compare July 22, 2025 14:15
@arozovyk arozovyk force-pushed the update_read_opams_incremental branch from 878ac75 to c0d8002 Compare July 23, 2025 14:59
@arozovyk arozovyk force-pushed the update_read_opams_incremental branch from c0d8002 to ba9b12a Compare July 23, 2025 15:04
@@ -15,7 +15,7 @@ let slog = OpamConsole.slog

type update =
| Update_full of dirname
| Update_patch of filename
| Update_patch of (filename * Patch.t list)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patch.t can sometimes get quite large in memory. I think it'd be best to only use the list of operation as we're gonna use more memory during the load

Suggested change
| Update_patch of (filename * Patch.t list)
| Update_patch of (filename * Patch.operation list)

(fun f2 -> Some (Patch.Git_ext (f1, f2, git_ext))))
in
let process_file acc file ~is_removal =
if not (String.ends_with file ~suffix:"opam") then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if not (String.ends_with file ~suffix:"opam") then
if not (String.ends_with file ~suffix:"/opam") then

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually this probably should match against the whole packages/<valid-pkg-name>/<valid-pkg-name>.<valid-version>/opam pattern

Comment on lines +125 to +128
if String.ends_with ~suffix:".new" repo_part then
String.sub repo_part 0 (String.length repo_part - 4)
else
repo_part
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if String.ends_with ~suffix:".new" repo_part then
String.sub repo_part 0 (String.length repo_part - 4)
else
repo_part
OpamStd.String.remove_suffix ~suffix:".new" repo_part

repo_part
in
if
String.equal (OpamRepositoryName.to_string repo_name) repo_part
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think this is true for git repositories. In any case i don't think the whole check is relevant here. Simply removing the part before the first / should be sufficient

Comment on lines +144 to +146
Option.bind (strip_repo_prefix old_file)
(fun o -> Option.bind (strip_repo_prefix new_file)
(fun n -> Some (Patch.Edit (o, n))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you won't need it anymore with the above comment but note that OpamStd.Option.Op exists for patterns like that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

opam update completely reloads the repository
2 participants