Skip to content

Conversation

arozovyk
Copy link
Collaborator

@arozovyk arozovyk commented Jul 22, 2025

This PR leverages the fact that we already compute a diff/patch during update to determine exactly which files have changed and it what way. Instead of reloading the entire repository we:

  • Propagate Patch.operation info from OpamFilename.patch
  • Add incremental loading via OpamRepositoryState.load_opams_incremental
  • Maintain existing opam files that haven't changed, avoiding unnecessary I/O

On Windows the OpamRepositoryState.load_opams_from_dir in OpamUpdate.repository currently takes ~10s. This PR brings it down to ~0.01s (the time it takes to load the opam files of a usual repository change).
On unix, it is a bit less dramatic, but still going from ~3.5s to 1ms

closes #5824

@kit-ty-kate kit-ty-kate added this to the 2.5.0~alpha1 milestone Jul 22, 2025
@arozovyk arozovyk force-pushed the update_read_opams_incremental branch 3 times, most recently from c0d8002 to ba9b12a Compare July 23, 2025 15:04
@arozovyk arozovyk force-pushed the update_read_opams_incremental branch 2 times, most recently from 2cb1720 to 3136926 Compare July 30, 2025 13:12
@arozovyk
Copy link
Collaborator Author

Updated simplfying a bunch of things.
We can't avoid parsing for diffs in case of Http and local since we have to first preprocess it, so I ended up reusing the result of Patch.parse from OpamSystem.patch in general for all backends.

@kit-ty-kate
Copy link
Member

We can't avoid parsing for diffs in case of Http and local since we have to first preprocess it

I'm not sure to follow, pre-processing isn't done for http/local (per apply_repo_update)

@arozovyk
Copy link
Collaborator Author

pre-processing isn't done for http/local

You're right, my bad. So we can reuse the diffs avoiding Patch.parse for http/local see f0a731c. Not sure it's worth it though, we propagte the full diffs (we need them up until applying in OpamSystem.internal_patch) that, as you said, can be quite large. So memory overhead for to-be-determined speed gain?

Copy link
Member

@kit-ty-kate kit-ty-kate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass of review. I haven't looked at everything yet but i think that's enough for now

| Some opams -> opams
| None -> OpamPackage.Map.empty
in
let process_file acc file ~is_removal =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's done elsewhere or something, but this function seems to assume every files that can change are opam files. What about changes from their files directory or other non-opam file related changes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function purpose is to load opam files to update their definitions in rt, all other files are ignored as it was already the case with original load_opams_from_dir function. Any other extra files are read by OpamFileTools.read_repo_opam and stored through metadata

@@ -0,0 +1,181 @@
N0REP0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests should be added all at once in the first commit of the PR, ran to unsure the initial state is correct, then ran again for each new commits to show new behaviours

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have a reftest called update.test i think it should probably go there instead

@arozovyk arozovyk force-pushed the update_read_opams_incremental branch from 0cfb4a3 to 6b19d06 Compare September 2, 2025 13:23
@arozovyk arozovyk force-pushed the update_read_opams_incremental branch from 6b19d06 to 74dbeb6 Compare September 2, 2025 13:25
@arozovyk arozovyk requested a review from kit-ty-kate September 2, 2025 13:26
OpamConsole.msg "[%s] synchronised from %s\n"
(OpamConsole.colorise `green
(OpamRepositoryName.to_string repo.repo_name))
(OpamUrl.to_string repo.repo_url);
log "%a: applying patch update at %a"
(slog OpamRepositoryName.to_string) repo.repo_name
(slog OpamFilename.to_string) f;
let preprocess =
match repo.repo_url.OpamUrl.backend with
| `http | `rsync -> false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we should start running the the preprocessing for local and http repositories. Could you bring the ?preprocess argument back?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this PR depends on the completion of #6599 first

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not, this is naturally done by the fact that OpamSystem.parse_patch_file is only called in OpamVCS (in OpamAction.prepare_package_build as well, but that's a different story), for local and http repositories we reuse the diffs directly.

@@ -0,0 +1,181 @@
N0REP0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have a reftest called update.test i think it should probably go there instead

@@ -67,7 +67,7 @@ let repository rt repo =
in
OpamProcess.Job.with_text text @@
OpamRepository.update r repo_root @@+ fun has_changes ->
let has_changes = if redirect then `Changes else has_changes in
let has_changes = if redirect then `Changes [] else has_changes in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems unnecessary. @rjbou do you know why this line was added in #5146 ?

### <REPO/packages/foo/foo.1/opam>
opam-version: "2.0"
synopsis: "Updated package"
### opam update default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice if the new incremental behaviour was shown in the test. We could select the right OPAMDEBUGSECTION to show that only a limited number of files are being read

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let internal_patch_error fmt =
Printf.ksprintf (fun str -> raise (Internal_patch_error str)) fmt
in
let patch_info_path =
OpamStd.Option.default ("in directory"^dir) patch_filename
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
OpamStd.Option.default ("in directory"^dir) patch_filename
OpamStd.Option.default ("in directory "^dir) patch_filename

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be updated on rebase

@arozovyk arozovyk requested a review from kit-ty-kate September 5, 2025 16:23
Copy link
Member

@kit-ty-kate kit-ty-kate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm overall, however it's a hard PR to review without its final clean commit structure so i'd rather wait for whenever this is finished.

I also think this PR should be queued on a PR adding a benchmark test for opam update as we currently don't have any. It would ideal to have a test for three kinds of cases:

with both local path and local git repositories

@@ -382,15 +382,14 @@ let diff_patch dir setup =
let apply ~git diff =
let git = if git then "GIT " else "" in
let result =
OpamFilename.patch ~allow_unclean:false diff
(dir / first)
OpamFilename.patch ~allow_unclean:false(`Patch_file diff) (dir / first)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

> 80 characters + minor missing space

Suggested change
OpamFilename.patch ~allow_unclean:false(`Patch_file diff) (dir / first)
OpamFilename.patch ~allow_unclean:false (`Patch_file diff)
(dir / first)

Comment on lines +55 to +61
| Some patch_file ->
let p = OpamFilename.to_string patch_file in
try
let diffs = OpamSystem.parse_patch_file
~dir:(OpamFilename.Dir.to_string repo_root) p in
OpamRepositoryBackend.Update_patch (patch_file, diffs)
with exn -> Update_err exn
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we had OpamFilename.parse_patch_file for a more OpamFilename friendly API we could simplify to:

Suggested change
| Some patch_file ->
let p = OpamFilename.to_string patch_file in
try
let diffs = OpamSystem.parse_patch_file
~dir:(OpamFilename.Dir.to_string repo_root) p in
OpamRepositoryBackend.Update_patch (patch_file, diffs)
with exn -> Update_err exn
| Some patch_file ->
match OpamFilename.parse_patch_file ~dir:repo_root patch_file with
| diffs -> OpamRepositoryBackend.Update_patch (patch_file, diffs)
| exception exn -> Update_err exn

| Patch.Create f -> Patch.Create (rm_prefix f)
| Patch.Delete f -> Patch.Delete (rm_prefix f)
| Patch.Edit (f1, f2) -> Patch.Edit (rm_prefix f1, rm_prefix f2)
| Patch.Git_ext (f1, f2, ext) -> Patch.Git_ext (rm_prefix f1, rm_prefix f2, ext)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

> 80 columns

Suggested change
| Patch.Git_ext (f1, f2, ext) -> Patch.Git_ext (rm_prefix f1, rm_prefix f2, ext)
| Patch.Git_ext (f1, f2, ext) ->
Patch.Git_ext (rm_prefix f1, rm_prefix f2, ext)

Now run 'opam upgrade' to apply any package updates.
### <REPO/repo>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### <REPO/repo>
### : Test incrementality
### rm -rf REPO
### <REPO/repo>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

opam update completely reloads the repository
2 participants