-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
locale is set to nothing leading to different than expected sorting of vectors of character strings in R #99
Comments
Hi @remlapmot -- to help us prioritise fixing this, can you let me know what impact this has on users and their research? |
This doesn't have much impact @inglesp. The only effect of this I have seen in OpenSAFELY repos is that graphs can be produced with categories in different orders compared to when the locale is set. So nothing incorrect is produced, but the plot has the x- or y-axis categories in a different order compared to what the user was expecting. This gives the user a few seconds of confusion, but they either ignore it or regenerate the plot locally to obtain the preferred/expected sort ordering. It affects the sort ordering of vectors of character strings with certain functions. Unfortunately those are the functions users are most likely to use, i.e., In the Rocker containers they set the locale to I went into a few other OpenSAFELY and datalab containers interactively and I couldn't really find one where the locale has been set. The locales were either empty or |
Thanks Tom. I expect we'll come to this when we revisit how the R docker image is built. |
That's great. Since I made the reverse-engineer fork of r-docker I have been rebasing it everytime the container has been updated (only 3 times since I made it) - so it's still up-to-date. https://github.com/opensafely/reverse-engineer-r-docker/branches I have made 4 active branches with slightly different approaches. Two of these are more stable, these are the 2 with names ending https://packagemanager.rstudio.com/client/#/repos/1/overview I had also been thinking about making another branch just using MRAN dated snapshot repos - but then all the packages would have to be built from source and so the Dockerfile would probably take several hours to build. The existing 4 branches take approx 30 mins to build because they download prebuilt Linux binary packages from RSPM. |
Closing as fixed by #123 . |
(sorry for the long report below and that it took me so long to spot this)
I'm sure by accident there the locale is set to nothing (is this referred to as unset? sorry I am not really familiar with locale grammar) in the r-docker container.
Unfortunately the sorting of character strings in I think the base R
sort()
andorder()
functions and hencedplyr::arrange()
are allaffected by locale.
Examples
docker run --platform linux/amd64 ghcr.io/opensafely-core/r:latest \ -e "Sys.getenv('LANG')"
locale
and which are generated/setup (did thisinteractively)
next bullet)
docker run --platform linux/amd64 ghcr.io/opensafely-core/r:latest \ -e "sort(c(head(letters), head(LETTERS)))"
en_US.UTF-8
we getFixes
stringr::str_sort()
instead of the other functions mentioned
docker run --platform linux/amd64 ghcr.io/opensafely-core/r:latest \ -e "stringr::str_sort(c(head(letters), head(LETTERS)))"
Once the locales have been generated the other fixes are
this probably not best practise)
Set
LANG="en_GB.UTF-8"
in a/workspace/.Renviron
file (mightalso be worth setting
LC_CTYPE
to same value as well)Set
LANG="en_GB.UTF-8"
in the globalRenviron.site
file in/usr/lib/R/etc
(might also be worth settingLC_CTYPE
to same value as well)(And when the Dockerfile is running again of course could just set
it in that with)
In R see the locales and sort helpfiles for more info
The text was updated successfully, but these errors were encountered: