Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebuild R image #121

Closed
bloodearnest opened this issue Jan 24, 2023 · 6 comments
Closed

Rebuild R image #121

bloodearnest opened this issue Jan 24, 2023 · 6 comments
Assignees

Comments

@bloodearnest
Copy link
Member

bloodearnest commented Jan 24, 2023

We cannot currently rebuild the R image from scratch. We have to add on to the existing R image we have.

This prevents various improvments, and means the R image is a special snowflake compared to our other images

@remlapmot
Copy link
Contributor

If helpful have done this in 3 branches using 2 different approaches

(in these I previously used some MRAN/Microsoft snapshot URLs but sadly Microsoft are discontinuing that service at the end of this month, news here)

In each case you can run the master-branch-name.sh script, i.e, master-branch-01.sh to build - depending on speed of internet connection branches 01 and 02 run in under 20 minutes because the packages are prebuilt.

@bloodearnest
Copy link
Member Author

Hi Tom.

We have an approach that seems to be working based on using renv to build specific versions of libraries. I'm currently running a test build of a new image against all OpenSAFELY R code (it will take a while...) to make sure it doesn't break anything.

My approach used renv in a similar way to your renv branch, AFAICT, except:

  • uses a docker buildkit cache mount to maintain a build cache, so each package only needs to be built once.
  • does multistage build to avoid cluttering the final image with build dependencies
  • does not use rocker, but instead our own base-docker image, which we need to for security reasons in production.
  • uses cran debian archives to supply a prebuilt minimal r-base-core=4.0.5.

Like your branch, we're also switching from 18.04 to 20.04 as the underlying series, mainly because 18.04 is nearly EOL. This does mean some of the underlying system libraries have changed slightly, but the R libraries are all the same.

I'll work on getting a PR up, and I'd love your feedback on it!

Once we've switched, we can work through some of the other issues you've called out, once we have a stable base to work from.

We want move away from using a single :latest version of all our runtime images, and towards explicit versions, e.g. run: r:4.0 ... or run: r:4.2. When we do that, we'll potentially be in a position to switch to using pre-built archives, and use more of rocker's tooling to build images.

@remlapmot
Copy link
Contributor

Sounds good Simon, I'm happy to look.

I assume from that package name being specified at 4.0.5, that will bump the version of R from 4.0.2 to 4.0.5. In general guess that it's good to be at the end of a patch series. Posit/RStudio only provide end of patch series versions of R (in addition to the current version) in their posit.cloud environment. Or was that a typo?

Another reason it's good to do this, is that although the tidyverse/Posit/RStudio policy is for their packages to work with the last 5 minor releases of R - which usually equates to 5 years - there are more packages on CRAN by other teams starting to require R version 4.1.0 because I think that's when the native pipe was introduced to R (|> as opposed to magrittr/dplyr pipe %>%). I ran update.packages(ask = FALSE) in the container and it only failed to update 1 package - Gmisc - due to that package using the native pipe. So it would be good to have a subsequent tagged version using at least R 4.1.0.

@bloodearnest
Copy link
Member Author

Ok, PR is here!

#123

@bloodearnest
Copy link
Member Author

Sounds good Simon, I'm happy to look.

I assume from that package name being specified at 4.0.5, that will bump the version of R from 4.0.2 to 4.0.5. In general guess that it's good to be at the end of a patch series. Posit/RStudio only provide end of patch series versions of R (in addition to the current version) in their posit.cloud environment. Or was that a typo?

Yes this is deliberate, to bring us up to date with the latest 4.0 release. This should be backwards compatable ugrade, and didn't seem to cause any issues in testing, and is easy enough to rollback if we need to.

Another reason it's good to do this, is that although the tidyverse/Posit/RStudio policy is for their packages to work with the last 5 minor releases of R - which usually equates to 5 years - there are more packages on CRAN by other teams starting to require R version 4.1.0 because I think that's when the native pipe was introduced to R (|> as opposed to magrittr/dplyr pipe %>%). I ran update.packages(ask = FALSE) in the container and it only failed to update 1 package - Gmisc - due to that package using the native pipe. So it would be good to have a subsequent tagged version using at least R 4.1.0.

Yep.

I'd like to have publish an r:4.2 image, with the same set of libraries, but at their latest versions. Then OpenSAFELY users can opt in to that by using r:4.2 in their project.yaml.

But we'll need to do that as a series of steps. We'd probably try take a different approach, using pre-built CRAN packages rather than building from source.

@remlapmot
Copy link
Contributor

remlapmot commented Jan 27, 2023

great thanks indeed Simon

(I have teaching stress on Tuesday, so it might take me until Wednesday to have a look at the PR.)

It would be great to make pre-built binary CRAN packages - to do that you need to make what is called a CRAN-like repository. For my own interest and also because Iain mentioned this a few months ago I wrote a blog post about how to do that for Linux binary packages

https://remlapmot.github.io/post/2022/make-linux-binary-cran-like-repo/

I know of 2 organisations which have publicly available CRAN-like repos with Linux binary packages - the Posit/RStudio package manager

https://packagemanager.posit.co/client/#/repos/2/overview

which make prebuilt binaries available for Bionic, Focal, and Jammy (as well as several other distros - it's incredibly impressive, as there are snapshots as well)

and the other is the R4PI project (which is actually run by one of the Posit/RStudio developers and uses the same technique)

https://r4pi.org/

The R4PI GitHub org is here

https://github.com/r4pi

I think the build scripts for its CRAN-like repo are in this repo:

https://github.com/r4pi/pkg_builder

It's two CRAN-like repos for the PI are available from

https://pkgs.r4pi.org/
https://pkgs.r4pi.org/armv7l/index.html
https://pkgs.r4pi.org/aarch64/index.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants