Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spike/R&D] Cache usage in tutor-mfe build #210

Closed
DawoudSheraz opened this issue May 21, 2024 · 7 comments
Closed

[Spike/R&D] Cache usage in tutor-mfe build #210

DawoudSheraz opened this issue May 21, 2024 · 7 comments
Assignees

Comments

@DawoudSheraz
Copy link
Contributor

In Tutor Users meeting on May 20th, 2024, a large chunk of the meeting was focused on discussing the large build times of tutor-mfe (https://openedx.atlassian.net/wiki/spaces/COMM/pages/3583016961/Tutor+Users+Group#2024-05-20). When various MFEs are building in parallel, both install and build steps take quite a while, often leading to npm connection error. The only workaround is to limit buildx parallelism to 1/2/4, depending on the resources.

One of the items discussed in the meeting was the usage of cache during MFE build. There are various items we would want to test/verify:

Besides the above items' verification, identify what cache led (or other workarounds) can we do to improve/reduce MFE build times.

@DawoudSheraz
Copy link
Contributor Author

@hinakhadim @FahadKhalid210 FYI

@hinakhadim
Copy link
Contributor

hinakhadim commented May 27, 2024

My final insights after running everything:

First-time build or --no-cache build:

  • All enabled MFEs try to install from npm. No npm cache is used. (Due to parallelism)

Second time build

  • If we add 1 more MFE, without changing steps for other MFEs (just adding steps of an MFE in already existing Dockerfile). Then npm cache is used for getting packages from it.

Answers to the Questions:

  • Is the docker cache being shared properly across layers?
    Yes, only after the first image build. The additional MFEs install packages from the cache (which we add once the image is already built).

  • Is the npm cache working as expected? How often do we see cache misses when the npm install is running for MFEs?
    Yes, it's working. Some packages always install giving us cache misses. Moreover, when the particular version doesn't exist in the cache required by MFE. But if a package exists in a cache, npm picks it from there.

  • Is anything being cached at all?
    Yes. Docker cache layers/steps. if anything won't change before that layer in the Dockerfile, then it is taken from a cache. If we make changes after a specific layer, docker rebuilds everything from zero for the new step and the afterward step.
    Yes, npm packages are also cached in the image folder.

  • The concern here is more of the npm cache. When installing 2 MFEs in parallel, how many of their packages are getting in the NPM cache? When a 3rd MFE starts clean-install, is it using the NPM cache if an existing package with the same version is to be installed?
    NPM cache is being used. (fully confirmed)

  • When does npm invalidate a package? Let's say we have react-bootstrap 15.0.0 in account and profile MFEs and react-bootstrap is 15.0.1 in discussion MFE. When the account and discussion MFE installs, will npm cache both 15.0.0 and 15.0.1? When the profile installs, will it use 15.0.0 from cache or will it have been invalidated?
    Multiple versions will be stored in the cache. Only invalidate if a patch is released (something like this). Profile se 15.0.0 from cache. as both 15.0.0 and 15.0.1 exist.

How npm install work?

  • first check package exists in the cache and verify its integrity from the net (is there a need to update this or not )

  • if yes, then install. In case of No, use it.

  • Install post-install scripts for packages

  • Does NPM cache size matter?
    No (In our case)

If anyone else want to add something, feel free to share.

If anyone else wants to add something, feel free to share.
I have searched a few things to try on, prefer-offline, or if we cache npm packages of a mfe to a single place at our laptop/server, then other mfe install from that. (just a thought yet). It's for a first-time install as first-time image building is our biggest concern.

@DawoudSheraz DawoudSheraz moved this from Pending Triage to In Progress in Tutor project management May 28, 2024
@FahadKhalid210
Copy link

Docker cache is shared properly across layers.
Docker Resources:
CPU: 16
Memory: 32 GB
Disk Limit: 136 GB
MFE first time build takes ~32 min
Second time from cache takes ~ 17.7 sec

Regarding npm cache, I tried building the images with the --no-cache option. However, I noticed that the npm cache is being hit multiple times during the build process. Additionally, I encountered network errors while attempting to build all the images.

image

@hinakhadim
Copy link
Contributor

hinakhadim commented May 31, 2024

Updates:

With --no-cache flag:
I tried building MFE image with the --no-cache on the sandbox server. There I got a cache hit which shows that the npm cache is being used by other MFEs npm installation as mentioned by @FahadKhalid210

@kdmccormick
Copy link
Contributor

This is a great writeup. Thank you all!

@hinakhadim
Copy link
Contributor

@tasawar-hussain suggested three ways to consider the cache usage for MFE build:

  1. multi stage builds
    This point says to use the multi-stage build feature of Docker that is already being used by MFE Dockerfile.

  2. npm cache with volume, but it would require additional configuration and management of the volume and it may be good for only one MFE. Something similar This is best for the case in which package.json changes frequently like version change that will bust docker's cache.

  3. using Docker Buildkit
    Here two solutions have been suggested. First is to use Docker buildkit that we're already using and it is by default provided by Docker Desktop. The second is to install dependencies before copying package.json by extracting dependencies in an extra file. But this only helps us when we have to increase the version or make changes in our package.json file other than deps change. In our case, we are not changing anything in package.json other than deps.

@DawoudSheraz
Copy link
Contributor Author

We have good enough context on this, closing this for now.

@github-project-automation github-project-automation bot moved this from In Progress to Done in Tutor project management Jul 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

4 participants