-
Notifications
You must be signed in to change notification settings - Fork 374
[WIP] Improve performance of opam update/init by changing the structure of the internal http opam repositories (use the tar.gz as-is) #6625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Hey, thanks for your work on that. Since you asked me directly
This should be fine from the design point of view for conex. Conex will need to interject between step 0 (you downloaded the tarball) and 2 (remove the old tar.gz). Currently, conex requires a diff file on disk, and the old repository as directory. But we can revise that interface, and conex could as well work on two tar.gz (and/or on two directories). I guess you have a clear understanding of the update process currently, and since you mention the different kinds (http, git, local) -- maybe we should re-think how opam and conex should interact to avoid burden paid by people not using conex, and avoid the burden of duplicating computations in both opam and conex. The latter may need to include conex as a library into opam. I'm away for the next 10 days (back on August 10th), but am happy to discuss this afterwards - esp.since I plan to revive my work on conex thereafter. |
To be more precise, given that
Then conex could do what is needed (compute the set of changed opam files, verify signatures; exit 0 on success); and could even report back the set of changed files to opam (I suspect this is what #6614 depends on) - using a file, or a socket, or if integrated with opam, this will be much simpler (using shared memory). For opam itself, I guess that #6349 and #5553 will improve a lot of updates already. |
Fixes #5741
Fixes #5648
Fixes #5484
Fixes #5346
Fixes #5559
cc @hannesm to check if it works for conex (2.1 worked with tar.gz files already so i'm not too scared about breakages)
Reasoning
#5741 shows that assumptions that hold true "most of the time" on some unix platforms such as "it is ok to scan a large tree structure of files and directories", don't hold true on other platforms. Systems such as Windows, network filesystems, busy shared servers whose disk is being constantly used, harddrives, … suffer from this.
In opam we can have 3 types of repositories:
Out of the three, the most critical for first time users is the first one. It is also the one that suffer the most from these issues as currently we:
opam update
: load only changed opam files #6614)VCS do not have step 4. Steps 1, 2 and 3 are builtin and heavily optimized. Is left only step 5 which should be improved by #6614 and for which we can improve further later by using
git cat-file
or even parse PACK files usingocaml-git
.Local/ssh repositories are the ones left a bit with very few things we can do about them. #5966 should help, but beyond, maybe we might want to require that people use git even for local repositories.
For HTTP though, the untarring (which takes 1+ minute) is the main issue. Thus this here PR.
Design decisions
Instead of untarring we simply use the tar.gz as-is and use
ocaml-tar
to read it on the fly.The new update steps are:
opam update
: load only changed opam files #6614)Given the ubiquity of the use of
OpamFilename.Dir.t
to mean both any random directory and a repository directory, i chose to first abstract over it in a newOpamRepositoryRoot
module, and work with the help of the type checker from there. Its interface help see what are the actions that opam does on repositories. While i'd rather keep them, theTar
andDir
submodules can be removed when everything is done.The
REPOSITORYTARRING
environment variable is removed by this work, given the repositories are tarred already.I had simplify
opam var pkg:opamfile
for this work. Previously it would point to the file in the repository. However this isn't what it's supposed to be doing. Instead it should point to the<switch>/.opam-switch/packages/
directory which actually reflects the opam file that was used to installed. Otherwise the opam file can change between before and after the user has calledopam update
etc.TODO
There are a number of
assert false (* TODO *)
in this draft PR. Those are to be fixed before undrafting but i felt reasonably confident with the rest of them to open this draft PR in this state to put more eyes on this work and to increase my self-motivation.OpamRepositoryState.get_repo_files
: a function which extracts a limited number of files from the tar.gz to a new cache directory.Some of these changes should probably be extracted to separate PRs but let's do that at the end when we have something that actually works.
While early form of this work started a year and a half ago, i believe the crust left over from that time should be minimal, after 6 different branches. The final rebase and split into smaller PRs shouldn't be too painful.
Future work
In the future we can use
ocaml-tar
that we now depend on to replace some of the uses of thetar
command. This should allow us to have better behaviours with things like symlinks on windows or even add new features such as excluding some directories (see ocaml/ocaml#14152).As mentioned above we can also improve local and git/vcs repositories with or without
ocaml-git
.