-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: working Dockerfile \o/ #123
Conversation
fc417c6
to
bf38343
Compare
This is super impressive Simon. The multi-stage build and the caching is superb. My comments are:
|
It's great to see the locale issue fixed. It turns out that C.UTF-8 is what Posit Cloud set their machines to. This seems to return the same string sort ordering for the 26 letter alphabet as when the locale is set to en_GB.UTF-8 (as it likely is on all/most users machines). sort(c(head(letters), head(LETTERS)))
## [1] "a" "A" "b" "B" "c" "C" "d" "D" "e" "E" "f" "F" So I'm pretty sure that most graphs with axes sorted by a string variable will now be the same whether produced in the container or on a local machine (that would be a graph made locally using summary data, obvs). |
This came for free by switching to use our standard base-action image, which sets up a bunch of sensible defaults for all our images 👍
Right, that makes sense, thanks for digging into it. I think we're ok with from a reproducability PoV if re-running old action with this image if the sort order ends up different. |
Gah, seems docker issue the warning even if it's not going to build that stage in the Dockerfile. We can give it a default arg, that should stop that warning. I'll check it out.
Yes, that would probably be better. It's hard to pin the version of the package manager that understands how to install specific version on a platform that doesn't natively support that :)
Yeah, I shied away from
That might be worth doing. However, given that it's just one package, and the version shouldn't matter to users of this image, I'm inclined to keep it unpinned atm. If a new version of
Yeah, it is not pretty. We could just do it via a
So, we're not installing any CRAN packages from the PPA, AFAIK. Those are packaged in apt with the names like
Bah. Stupid docker. If we lowercase the build image tag, it works, fix incoming.
I expect if failed because it couldn't find the 2nd two names on CRAN. But it would be nice to make that issue error early with a helpful error message. FTR, I'd expect in a future version
Yep, that sounds right, that script is bit hacky, probably worth doing it properly. Just building mrbayes now to test. |
mrbayes worked first time for me. I suspect this is an issue with not specifying bash properly, perhaps. I've added a commit to attempt to fix that.
Revisiting this, I'm not surprised it fails on git-bash using Docker for Windows. I'm not sure how well the build mount stuff works there. These images are designed to be built and published in a linux VM in Github Actions, and we've never tested building them with Docker for Windows. I would expect it to work in WSL2, however. When I get chance, I'll reboot and see if there's any easy fix. |
thanks Simon (sorry I can't read Docker error messages properly) I agree renv is unlikely to break in a future version. I reran from the new commits and build still fails on both Git Bash and WSL 2 with a similar message about
And on my Mac the tests after adding a package still fail because of the quote.
Also you might want to commit into the repo the file
|
I fixed #!/usr/bin/env bash
set -eu
IMAGE=${1:-ghcr.io/opensafely-core/r}
python3 -c 'import json; print("\n".join(json.load(open("renv.lock"))["Packages"]))' | xargs -I {} echo "library({}, warn.conflicts = FALSE)" > .tests.R
docker run --rm -v "$PWD:/workspace" "$IMAGE" .tests.R |
I had good progress on my Windows machine - the build succeeded under WSL 2. It turned out that as per this Stack Overflow post, something was wrong in my And then |
Two more small comments. The only other output that was different on my machines was from line 34 of Also I ran a few more
fails because it seems dependency R package rlang requires updating. The output was
|
Nice, have pushed this change, and silenced the IMAGE_TAG warning. If you have any ideas for some more tests above just importing libraries, I'm all ears :) |
Huh. That seems like a good idea. The ordering is just really to make the diffs more understandable when reviewing changes, e.g. we can spot an unintended version bump more easily. Will add.
Yeah, not much we can do about that, as our general policy is no version bumps, to avoid breaking backwards compat. |
Thanks for these Simon. I just noticed that I think you need a little fix when using the
You could just crop the image tag name from before the (With that fixed) I would perhaps add a comment to the last section of the README that if a package fails to install at its current version on CRAN then it is recommended to look up the prior version numbers of the package from its CRAN old sources page, which has a URL of the form |
Sorry I don't. This is a really neat test. Although you can make a case for saying it's not really necessary because during the process of installing a package, R checks whether that package can be successfully loaded. However, given this test is really quick it does no harm so I'd leave it in. |
thanks - yes I realised that. I did exactly the same at the end of my 3 legacy branches, I just did my sorting in R. |
Done, thanks
Good idea, added a section to the README on this. |
thanks for those commits Simon.
(After these I think that's probably me out of comments.) Also, I had a look at adding butcher. I think you need to go back to version 0.1.5 of that such that its dependencies that are already in the container meet its requirements (this is going back quite a few versions but the version of rlang in this container is quite old). |
It's great that this is getting improved! 👍
These issues might be worth tracking for possible improvements to this in future: |
I just noticed there is an unlucky accidental error in e4f1961 . The problem is that the row with the column names, |
Also I checked yesterday's package, forestploter, and just add-package forestploter works beautifully in the current build of this image. |
You're absolutely right, once |
Bah, yes. Ok, so, is there a way for the R that generates the csv data to sort, rather than sorting in bash? |
You could either use stringr ( In my branches I used stringr but I think you might want to go with data.table. This is because stringr sorts using an Your alternative syntaxes are approximately as follows. For stringr library(dplyr)
readr::read_csv("packages.csv",
col_types = "cc",
col_names = c("Package", "Version")
) %>%
filter(Package != "Package") %>%
arrange(stringr::str_rank(Package)) %>%
readr::write_csv("packages-alphabetical.csv", quote = "all") For data.table library(data.table)
dat <- read.csv("packages.csv", header = FALSE)
colnames(dat) <- c("Package", "Version")
# Remove accidental inclusion of c("Package", "Version") as a row
dat <- subset(dat, Package != "Package")
dat <- data.table(dat)
# data.table setorder() sorts in C-locale
setorder(dat, Package)
write.csv(dat, file = "packages-alphabetical.csv", row.names = FALSE) |
Fixed in #126, will update this PR to use the default sorting once that's landed. Went with the simplest approach, of just using the same locale for generating the csv. |
7e15fa5
to
7eb445f
Compare
This change fixes the longstanding issue of not being able to rebuild our R image, which was causing many issues. It solves the problem of reproducing the R library versions as in the current R docker image by using renv to manage a lockfile of versions. This is bit nuanced - renv is not really designed for this use case - but it does work. Check the Dockerfile comments for details. With a cold cache, it is very slow to build all the initial packages (i.e. 1-2 hours). But the build process keeps a local cache of them on the host, so rebuilds are fast. Whilst there are no R library version changes, there are some system library changes, as we moved from 18.04 to 20.04 as a base image. This was necessary because a) 18.04 is EOL shortly and b) we don't maintain a base 18.04 image. The file `20.04-library-changes.txt` contains a summary of differences, basically a bunch of uprades to system libraries. Also, we upgraded to the latest point release of R 4.0 series, 4.0.2 -> 4.0.5. Additionally, many unused components have been removed from the image: - all build toolchain/-dev packages - java/python integration - various x11/gtk dependencies These made the original image heavier, and to the best of our knowledge, have never been used by any projects. If this turns out to be mistaken, we can add things back easily enough. The old image was 4.5G from dockers - the new one is 1.9G. Testing: An R image build with this process was used to test ~30 OpenSAFELY project.yamls with R code in, and it all worked, so we have some confidence we've not broken anything. We'll keep the old image unter r:legacy tag or something, so we can easily access it if needed. Note: this does not add a publishing workflow to the repo, and thus does not change the current image by landing. Whether we can build it in GH remains to be seend. With no cache, I'd expect it to take multiple hours.
- make sure buildkit is used for everything - supply a nonsense default arg for PACKAGE to silenc docker warning
Also silence the IMAGE_TAG warning from docker-compose
Co-authored-by: Steven Maude <[email protected]>
Whenever we run R in the container, it activates renv, which AFAICT will *always* create an `renv` directory in the current. The test.sh script was mounting this directory in, thus creating an renv dir in it, so we stop doing that. Additionally, add renv/ dir to the gitignore. In the processs, realised this dir creation created a sandbox dir, which could be instead done once in /renv, so made that change too. Ideally, we'd tell renv to create the dir somewhere else, but there doesn't seem to be away to do that, it's hardcoded to be `./renv`. Also, apply suggested docs fix from Tom.
7eb445f
to
9f5571c
Compare
Right, I think this is ready to go, all outstanding issue dealt with. Will merge and publish later today if no one shouts! |
@bloodearnest: is this something to explain about in Platform News, maybe? Either that or a blog post? Also, should we be advising users how to clean up the old image, either there or elsewhere, once they have asserted that they definitely don't need the |
(I don't think this ⬆️ blocks merging the PR.) |
Yep, there's a couple of pieces of news blocked on publishing this
It doesn't share any layers, but fortunately, And the plan is that we decide to switch Of course, many things could go wrong. If we need to roll back, I've already published the current image as |
Great (I am travelling today, so I haven't tested the new commits) apart from putting backticks around the URL at the end of the README (to stop people accidentally clicking it and thinking it's invalid) this looks great to me. It has been great to learn about caching - thanks Simon |
TIL; great! |
This change fixes the longstanding issue of not being able to rebuild
our R image, which was causing many issues.
It solves the problem of reproducing the R library versions as in the
current R docker image by using renv to manage a lockfile of versions.
This is bit nuanced (renv is not really designed for this use case)
but it does work. Check the Dockerfile comments for details. With
a cold cache, it is very slow to build all the initial packages (i.e.
1-2 hours). But the build process keeps a local cache of them on the
host, so rebuilds are fast.
Whilst there are no R library version changes, there are some system
library changes, as we moved from 18.04 to 20.04 as a base image. This
was necessary because a) 18.04 is EOL shortly and b) we don't maintain
a base 18.04 image. The file
20.04-library-changes.txt
containsa summary of differences, basically a bunch of uprades to system
libraries. Also, we upgraded to the latest point release of R 4.0
series, 4.0.2 -> 4.0.5.
Additionally, many unused components have been removed from the image:
These made the original image heavier, and to the best of our knowledge,
have never been used by any projects. If this turns out to be
mistaken, we can add things back easily enough. The old image was 4.5G
from dockers
Testing:
An R image build with this process was used to test ~30 OpenSAFELY
project.yamls with R code in, and it all worked, so we have some
confidence we've not broken anything.
We'll keep the old image unter r:legacy tag or something, so we can
easily access it if needed.
Note: this does not add a publishing workflow to the repo, and thus does
not change the current image by landing.
Whether we can build it in GH remains to be seen. With no cache, I'd
expect it to take multiple hours.