Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for installation from renv.lock lockfile? #343

Open
kevinushey opened this issue Dec 8, 2021 · 26 comments
Open

support for installation from renv.lock lockfile? #343

kevinushey opened this issue Dec 8, 2021 · 26 comments
Labels
feature a feature request or enhancement

Comments

@kevinushey
Copy link

It looks like pak has its own machinery for creating and installing from lockfiles, e.g.

pak::lockfile_create(<pkgs>)
pak::lockfile_install()

Is there a straightforward mechanism whereby lockfiles created by renv.lock could work here as well?

The main barrier I see is that one cannot yet create lockfiles from versioned R packages; e.g.

> pak::lockfile_create("[email protected]")
x Creating lockfile pkg.lock [66ms]
Error: Cannot install packages:
* [email protected]: Versioned CRAN packages are not implemented yet

If we had that, I think it would be straightforward for renv to create a list of versioned remotes that could be passed into pak.

@gaborcsardi
Copy link
Member

pak has more information in its lockfile, so you cannot replace it with an renv lockfile.

Nevertheless it is in the plans to install an renv lockfile.

@gaborcsardi gaborcsardi added the feature a feature request or enhancement label Dec 8, 2021
@gaborcsardi
Copy link
Member

@kevinushey One issue with the renv lockfile is that AFAICT it does not contain the dependencies, so we don't know the right installation order. This means that to start the installation we would need to unpack all packages first, and then look up the dependencies.

This would be much simpler if the lockfile also had the dependencies, just the package names, e.g.

...
      "Package": "callr",
      "Version": "3.7.0.9000",
      "Source": "GitHub",
      "RemoteType": "github",
      "Dependencies": ["processx", "R6"],
...

@kevinushey
Copy link
Author

What values would be included in Dependencies -- should that be all of Depends + Imports + LinkingTo? (Or should I consider just including those in the lockfile entries?)

Resolving the installation order post-hoc doesn't seem that bad; if I understand correctly you'd have to do this anyway if someone provided (for example) a URL remote or another similar "exotic" remote. I'm not sure if supporting those is something pak plans to do, though.

@gaborcsardi
Copy link
Member

Just those in the lockfile entries. renv does not add URLs for CRAN-like packages, so you'd need LinkingTo as well, in case a source package is installed.

pak lockfiles have the dependencies for all packages, including exotic ones, for example

pak::lockfile_create("url::https://cran.rstudio.com/src/contrib/cli_3.1.0.tar.gz")

will create

...
    {
      "ref": "url::https://cran.rstudio.com/src/contrib/cli_3.1.0.tar.gz",
      "package": "cli",
      "version": "3.1.0",
      "type": "url",
      "direct": true,
      "binary": false,
      "dependencies": ["glue"],
      "vignettes": false,
      "needscompilation": true,
      "metadata": {
        "RemotePkgRef": "url::https://cran.rstudio.com/src/contrib/cli_3.1.0.tar.gz",
        "RemoteType": "url",
        "RemoteEtag": "\"75b0e-5cf56b5f6c91b\"",
        "RemotePackaged": "TRUE"
      },
      "sources": ["https://cran.rstudio.com/src/contrib/cli_3.1.0.tar.gz"],
...
    },
    {
      "ref": "glue",
      "package": "glue",
      "version": "1.5.1",
      "type": "standard",
      "direct": false,
      "binary": true,
      "dependencies": [],
      "vignettes": false,
      "needscompilation": false,
      "metadata": {
        "RemoteType": "standard",
        "RemotePkgRef": "glue",
        "RemoteRef": "glue",
        "RemoteRepos": "https://cloud.r-project.org",
        "RemotePkgPlatform": "aarch64-apple-darwin20",
        "RemoteSha": "1.5.1"
      },
      "sources": ["https://cloud.r-project.org/bin/macosx/big-sur-arm64/contrib/4.1/glue_1.5.1.tgz", "https://mac.r-project.org/bin/macosx/big-sur-arm64/contrib/4.1/glue_1.5.1.tgz"],
...

No dependency resolution is performed here at all, when installing this. We can start downloading packages right away, we don't even need the CRAN metadata. Then we can start installing them right away, using as many subprocesses as possible.

FWIW other software does the same, e.g. a Cargo.lock or a package-lock.json both have dependencies included.

@kevinushey
Copy link
Author

kevinushey commented Dec 16, 2021

Would this be sufficient?

{
  "R": {
    "Version": "4.1.2",
    "Repositories": [
      {
        "Name": "CRAN",
        "URL": "https://cran.rstudio.com"
      }
    ]
  },
  "Packages": {
    "cli": {
      "Package": "cli",
      "Version": "3.1.0",
      "Source": "CRAN",
      "Repository": "CRAN",
      "RemoteType": "standard",
      "RemotePkgRef": "cli",
      "RemoteRef": "cli",
      "RemoteRepos": "https://cran.rstudio.com/",
      "RemotePkgPlatform": "source",
      "RemoteSha": "3.1.0",
      "Hash": "66a3834e54593c89d8beefb312347e58",
      "Requirements": [
        "glue"
      ]
    },
    "glue": {
      "Package": "glue",
      "Version": "1.5.1",
      "Source": "CRAN",
      "Repository": "CRAN",
      "RemoteType": "standard",
      "RemotePkgRef": "glue",
      "RemoteRef": "glue",
      "RemoteRepos": "https://cran.rstudio.com/",
      "RemotePkgPlatform": "source",
      "RemoteSha": "1.5.1",
      "Hash": "e01bc1fe0c20954ec97eac86640abc70",
      "Requirements": []
    }
  }
}

Note that 'Requirements' field which just provides the name of other packages in the lockfile this package depends on.

@gaborcsardi
Copy link
Member

Yes, that is exactly what we need, like in #343 (comment).

@gaborcsardi
Copy link
Member

@kevinushey So, if you add the dependencies to the lockfile, then I can implement this pretty quickly.

@kevinushey
Copy link
Author

This has been implemented now; the Requirements entry for each package record will be a JS array of package names (which are all of Depends / Imports / LinkingTo). You should see that if you test with the development version of renv.

@pat-s
Copy link

pat-s commented Nov 4, 2022

@gaborcsardi @kevinushey Just passing by and querying for the current status as it has been quite here for some time.

Using {pak} as a backend for {renv} package installs would really be great to have! 👀

@baggiponte
Copy link

ciao @pat-s, it seems that {renv} is supporting {pak} (see the config option here). IIUC, there are still some things to sort out - for example the installation procedure: should I install.packages(c('renv', 'pak')) or use {renv} to install {pak} or viceversa?

@ccasar
Copy link

ccasar commented Nov 14, 2022

I''ve just tried this with renv 0.16.0 and setting options(renv.config.pak.enabled = TRUE)
It seems you need to install pak before running, e.g. renv::restore otherwise the the standard way to install packages will be used.

After installing pak successfully and running renv::restore immediately the following error came up. I'm using a renv.lockfile that was created with renv 0.15.5 not sure if that is the reason, because the same error came up when testing the above commands with renv 0.15.5.

Error: <callr_remote_error: Can't parse remotes: >
 in process 753095 
-->
<simpleError in get_remote_types(refs): Can't parse remotes: >

 Stack trace:

 12. (function (...)  ...
 13. base:::withCallingHandlers(cli_message = function(msg) { ...
 14. get("pkg_install_make_plan", asNamespace("pak"))(...)
 15. pkgdepends::new_pkg_installation_proposal(pkg, config = list(libr ...
 16. pkg_installation_proposal$new(refs, config = config, ...)
 17. pkgdepends:::initialize(...)
 18. pkg_plan$new(refs, config = config, library = config$library,  ...
 19. pkgdepends:::initialize(...)
 20. pkgdepends:::pkgplan_init(self, private, refs, config, library,  ...
 21. pkgdepends:::parse_pkg_refs(refs)
 22. pkgdepends:::get_remote_types(refs)
 23. base:::stop("Can't parse remotes: ", paste(refs[bad], collapse =  ...
 24. base:::.handleSimpleError(function (e)  ...
 25. h(simpleError(msg, call))
 26. base:::stop(e)
 27. (function (e)  ...

 x Can't parse remotes:  

Traceback (most recent calls last):
11: install.packages("pak")
10: install(pkgs)
 9: renv_pak_install(packages, libpaths)
 8: pak$pkg_install(pkg = packages, lib = library[[1L]], upgrade = TRUE)
 7: remote(function(...) get("pkg_install_make_plan", asNamespace("pak"))(...), 
        list(pkg = pkg, lib = lib, upgrade = upgrade, ask = ask, 
            start = start, dependencies = dependencies, loaded = loaded_packages(lib)))
 6: err$rethrow(stop(res$error$parent$error), res$error$parent, call = FALSE)
 5: withCallingHandlers(expr, error = function(e) {
        if (is.null(e$`_nframe`)) 
            e$`_nframe` <- length(sys.calls())
        e$`_childcall` <- realcall
        e$`_childframe` <- realframe
        e$`_childignore` <- list(c(realframe + 1L, realframe + 1L), 
            c(e$`_nframe` + 1L, sys.nframe() + 1L))
        throw(cond, parent = e)
    })
 4: stop(res$error$parent$error)
 3: <condition-handler>(...)
 2: throw(cond, parent = e)
 1: stop(cond)

@alexvpickering
Copy link

Is there any progress on this @gaborcsardi? I'd be happy to try to put together a PR if you can give some direction on how to go from the renv lockfile with dependencies specified (above) to a functional pak lockfile.

@gaborcsardi
Copy link
Member

@kevinushey So if pak were to install packages from an renv lockfile, where would those packages go, and how would this work together with renv's libraries?

Would the renv project need to be activated first?

Would pak need to be a dependency in the renv project?

If the renv project does not need to be active, then pak would install the packages into a regular (non-renv) library? That does not seem ideal.

I don't really see how this would work.

@jrosell
Copy link

jrosell commented Sep 13, 2023

@ccasar where could we find the renv.lock that you have used?

@kevinushey
Copy link
Author

So if pak were to install packages from an renv lockfile, where would those packages go, and how would this work together with renv's libraries?

They would get installed into .libPaths()[1], which would normally be set to the renv project library when renv is loaded for a project.

Would the renv project need to be activated first?

Yes, to ensure the library paths are set appropriately.

Would pak need to be a dependency in the renv project?

It shouldn't; at least from the renv side, renv::install() and friends automatically install and load pak when required, so it's sort of an automatically-fulfilled implicit dependency for projects that have opted-in to using pak.


Just to re-iterate, right now, renv::install() uses pak when options(renv.config.pak.enabled = TRUE) is set. When this is set, renv basically forms a call like the following:

pak::pkg_install(c("[email protected]", ...))

That is, it transforms the lockfile record entries into short-form remotes (including their versions) that can be processed by pak.

@gaborcsardi
Copy link
Member

gaborcsardi commented Sep 13, 2023

OK, but that means that there is no pak function to add, and pak does not actually need to read the renv lockfile, and there is nothing to do here, essentially?

@alexvpickering
Copy link

alexvpickering commented Sep 13, 2023

I was able to reduce our GitHub Actions build times from ~4.5 hrs to ~1hr in this repo.

I couldn't get renv::restore() with options(renv.config.pak.enabled = TRUE) to work. pak's usage of pkgdepends (the standard usage) leads to complaints of conflicting versions.

I was able to cobble together a solution that uses pkgdepends directly. It successfully installs the majority of packages in the renv.lock though some are not installed (not sure why). To get around this, our Dockerfile first uses pkgdepends (in restore_fast.R) and then uses renv::restore (in restore_renv.R) to install any remaining packages. A couple of other issues I noted:

  • renv::restore will re-install packages that are 'crossgrade' (same version installed but a current snapshot would produce a lockfile that has different fields/values). I skip these in the above repo.
  • pkgdepends will throw an error if there are packages that have been removed from CRAN (archived versions available e.g. Matrix.utils) with with error invalid version specification 'NA'. The workaround for these is to direct install the package using renv::install from the archived url and then take a snapshot. renv does not have a problem resolving these packages so it should be possible to fix this as well.

@gaborcsardi
Copy link
Member

@alexvpickering One hour still seems pretty long, have you tried to use binary Linux packages from https://packagemanager.posit.co?

Re Matrix.utils, that's not in the lockfile at https://github.com/hms-dbmi-cellenics/pipeline/blob/master/pipeline-runner/renv.lock, am I looking at the wrong file?

Btw. the Bioconductor packages are supposed to install from their git repository? That seems a bit weird, and it is probably slower than installing them from their CRAN-like repository.

@gaborcsardi
Copy link
Member

@kevinushey Do you have a list of possible values for the Source field in the lockfile, and the extra fields added for each package source? Just to make sure that pak can indeed install all possible package sources.

@kevinushey
Copy link
Author

OK, but that means that there is no pak function to add, and pak does not actually need to read the renv lockfile, and there is nothing to do here, essentially?

Yeah, at least from renv's perspective, now that pak supports versioned remotes, we can use pak to install packages from a lockfile.

Given this, there's probably not an explicit need for pak to support renv lockfiles, since anyone who wants to use an renv lockfile should be using renv::restore(), and so rely on renv to use pak appropriately.

tl;dr: unless you plan to further extend pak here, I think we can close this?

@kevinushey
Copy link
Author

@kevinushey Do you have a list of possible values for the Source field in the lockfile, and the extra fields added for each package source? Just to make sure that pak can indeed install all possible package sources.

The main things you'll see there are "Repository", "Bioconductor", other values already encoded in "RemoteType", "unknown", and "Cellar" (for packages that were found in the renv cellar; https://rstudio.github.io/renv/articles/package-sources.html#the-package-cellar). Although I'm not sure if the "Cellar" source is a good idea...

@gaborcsardi
Copy link
Member

Sorry, what I meant is, what values can be in the Source field? E.g. in the lockfile above, there are these:

❯ names(table(sapply(df$Packages, "[[", "Source")))
[1] "Bioconductor" "GitHub"       "Repository"   "URL"

but there are probably others?

@ccasar
Copy link

ccasar commented Sep 13, 2023

@ccasar where could we find the renv.lock that you have used?

Sorry for not providing it earlier @jrosell. I'm trying to reproduce the error now with renv 1.0.2 and pak 0.6.0, but it seems to be solved in the meantime.

@kevinushey
Copy link
Author

but there are probably others?

If the package was installed with remotes or pak, then the RemoteType written by that package would be copied over as the "Source".

It's probably easier to just look at the implementation here: https://github.com/rstudio/renv/blob/main/R/snapshot.R#L712-L760

@alexvpickering
Copy link

@alexvpickering One hour still seems pretty long, have you tried to use binary Linux packages from https://packagemanager.posit.co?

Not tried yet. Locally the restore takes ~30 mins (more cores than GA) for ~250 packages. Time is fine for our purposes now as we also employ caching of images so 1hr is worst case scenario and much better than 4.5 hours. Timing would be improved decently if the renv::restore wasn't necessary at all.

Re Matrix.utils, that's not in the lockfile at https://github.com/hms-dbmi-cellenics/pipeline/blob/master/pipeline-runner/renv.lock, am I looking at the wrong file?

Yes sorry my bad, Matrix.utils was a problem case for a related repo. For the above repo, an example is spatstat.core.

Btw. the Bioconductor packages are supposed to install from their git repository? That seems a bit weird, and it is probably slower than installing them from their CRAN-like repository.

Not sure I understand. All packages were just installed and snapshots taken as normal as far as I remember.

@gaborcsardi
Copy link
Member

Matrix.utils seems like a bug in pak/pkgdepends.
spatstat.core does not compile on my machine, so that's probably the reason.

In any case, if pkgdepends cannot install a package that it should be able to install, please open an issue! Thanks!

Not sure I understand. All packages were just installed and snapshots taken as normal as far as I remember.

No worries, it was more of a question for Kevin, but I think I understand the reason now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

7 participants