Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance opportunity for yarn 3 cache #325

Closed
belgattitude opened this issue Sep 3, 2021 · 34 comments
Closed

Performance opportunity for yarn 3 cache #325

belgattitude opened this issue Sep 3, 2021 · 34 comments
Assignees
Labels
feature request New feature or request to improve the current logic

Comments

@belgattitude
Copy link

belgattitude commented Sep 3, 2021

Thanks for the new cache feature. Much easier.

After few weeks, I realized that while it supports yarn, there's some improvements that can be made.

Yarn 3 (probably yarn 2+ too) manages downloaded archives pretty well (.yarn/cache/*.zip) and invalidating on yarn.lock changes does not take that into account and I saw a lot of cache misses.

As an example I converted back to action-cache to illustrate and test.

I'm wondering if a similar approach could be done with setup-node ?

Updated example with action cache

Updated on Nov 23th, taking into account @merceyz comment.

Setup action

      # Get the yarn cache path.
      - name: Get yarn cache directory path
        id: yarn-cache-dir-path
        run: echo "::set-output name=dir::$(yarn config get cacheFolder)"
    
      - name: Restore yarn cache
        uses: actions/cache@v2
        id: yarn-cache # use this to check for `cache-hit` (`steps.yarn-cache.outputs.cache-hit != 'true'`)
        with:
          path: ${{ steps.yarn-cache-dir-path.outputs.dir }}
          key: yarn-cache-folder-${{ hashFiles('**/yarn.lock', '.yarnrc.yml') }}
          restore-keys: |
            yarn-cache-folder-

Testing a cache hit after adding a dependency

With setup node cache, as the yarn.lock have changed all packages would be fetched again (1m 28s)
rather than (1s 124ms). Environmentally friendlier 🌳

yarn install --immutable
➤ YN0000: ┌ Fetch step
  ➤ YN0013: │ 1719 packages were already cached, one had to be fetched (superjson@npm:1.7.5)
➤ YN0000: └ Completed in 1s 124ms

Cache Size: ~127 MB (133261279 B)
Cache saved successfully
Cache saved with key: yarn-cache-folder-os-Linux-node--f118ea4bee07eada9df36ad2e83fd6febcbaf06b5b8962689c7650659e872ad3

PS: Key points

~Example with action cache~ (old version, before @merceyz improvements)
      - name: Get yarn cache directory path
        id: yarn-cache-dir-path
        run: echo "::set-output name=dir::$(yarn config get cacheFolder)"
      - name: Restore yarn cache
        uses: actions/cache@v2
        id: yarn-cache 
        with:
          path: ${{ steps.yarn-cache-dir-path.outputs.dir }}       
          key: yarn-cache-folder-os-${{ runner.os }}-node-${{ env.node-version }}-${{ hashFiles('**/yarn.lock', '.yarnrc.yml') }}      
          restore-keys: |
            yarn-cache-folder-os-${{ runner.os }}-node-${{ env.node-version }}-
            yarn-cache-folder-os-${{ runner.os }}-

Note

Here's a gist with an optimized install example: https://gist.github.com/belgattitude/042f9caf10d029badbde6cf9d43e400a

@merceyz
Copy link

merceyz commented Sep 10, 2021

It can be even more efficient by not including the OS and Node version in the cache key as well - #272 (comment)

@dmitry-shibanov
Copy link
Contributor

Hello @belgattitude. Thank you for your feature request. Could you please describe it little bit. Do you want to add restore-keys to the action to take previous cache or you also want to add .yarn/cache/*.zip to cached directories ?

@dmitry-shibanov dmitry-shibanov added the feature request New feature or request to improve the current logic label Nov 12, 2021
@dmitry-shibanov
Copy link
Contributor

Hello @belgattitude, just a gentle ping.

@belgattitude
Copy link
Author

Let me check something in a couple of days. I'll be back asap.

@belgattitude
Copy link
Author

belgattitude commented Nov 23, 2021

@dmitry-shibanov first of all thanks for considering this issue. I've edited the P/R desc with latest comments from @merceyz

Do you want to add restore-keys to the action to take previous cache

Yes that's the idea, that way we take advantage of the built-in yarn 2+/3+ cache management. Yarn will prune old packages and only fetch new ones keeping the packages zipped into cacheFolder in sync with package.json (works with workspace/monorepo too).

Note that if multiple actions are run in parallel, only the first one will be able to reserve the cache, others will print a warning after:

> Post restore yarn cache
   Unable to reserve cache with key yarn-cache-folder- 
   cb79cb8e7c0fdd859237ac516da53bc2fdafceb3230fd68444aa4f55c9238980, another job may be creating this cache.

As far as I tested it does not create problems and can be safely ignored..

you also want to add .yarn/cache/*.zip to cached directories

Actually this seems to be what matters most. (AFAIK setup/node try to cache only the node_modules folders, right ?)

There's some kind of guarantee that yarn cache is portable across OS and node versions as @merceyz pointed out, while node_modules might not (native binaries, postinstall tricks).

As the cache folder is configurable (through yarnrc.yml or YARN_CACHE_FOLDER env), a good way to read it

      # Get the yarn cache path.
      - name: Get yarn cache directory path
        id: yarn-cache-dir-path
        run: echo "::set-output name=dir::$(yarn config get cacheFolder)"

The restore key could be a constant:

      - name: Restore yarn cache
        uses: actions/cache@v2
        id: yarn-cache # use this to check for `cache-hit` (`steps.yarn-cache.outputs.cache-hit != 'true'`)
        with:
          path: ${{ steps.yarn-cache-dir-path.outputs.dir }}
          key: yarn-cache-folder-${{ hashFiles('**/yarn.lock', '.yarnrc.yml') }}
          restore-keys: |
            yarn-cache-folder-

Interestingly pnpm has a benchmark page with some insights about the different modes (cache+node_modules, cache only)

image

Looks having both node_modules + yarn cache will be faster (excluding action/cache restore/generate...). I guess even more as yarn won't probably have to re-run the link phase (generally slow with native binaries: esbuild, swc, sharp, prisma...). But with caching node_modules I'm not sure of the portability though (it depends also of setup/node). Note also that node_modules folder is only used in nmLinker: node-modules or nmLinker: pnp modes (otherwise it's assumed to be yarn pnp)

I would say a safe way would be to just save 'yarn cache folder' and not node_modules.

Let me know if it answer your questions.

Have a great day

PS: @merceyz thanks for your comment about portability, I was wondering if the newly introduced supportedtArchitecture change something ?

@devgioele
Copy link

devgioele commented May 5, 2022

Do you recommend using actions/setup-node to cache node_modules and to use actions/cache to cache "the yarn cache"?
Caching both would allow us to skip the linking step too, right? I don't see why it cannot be done.

@panticmilos
Copy link
Contributor

Hi @devgioele,

Caching node_modules is not possible using setup-node. This action is only caching on a global level. If you would like to cache node_modules, please use actions/cache.

Cheers

@mklueh
Copy link

mklueh commented Jul 10, 2022

Deleting yarn.lock and running yarn install helped, but in my case with yarn caching enabled

@nickserv
Copy link

I think this is a duplicate of #328, which is similar but not specific to yarn

@belgattitude
Copy link
Author

belgattitude commented Oct 9, 2022

Just to share, here's my updated composite action for

  1. yarn 3+ with node-modules linker: https://gist.github.com/belgattitude/042f9caf10d029badbde6cf9d43e400a
  2. pnpm 7+: https://gist.github.com/belgattitude/838b2eba30c324f1f0033a797bab2e31

marcusrbrown added a commit to marcusrbrown/translate-app that referenced this issue Oct 22, 2022
marcusrbrown added a commit to marcusrbrown/translate-app that referenced this issue Oct 22, 2022
Simplify dependency caching for Yarn 3+ with Zero Installs. Inspired by actions/setup-node#325 and
gist.github.com/belgattitude/042f9caf10d029badbde6cf9d43e400a.
@marcofugaro
Copy link

Would be good to have support for Yarn 3 caching

@merceyz
Copy link

merceyz commented Jan 11, 2023

It can be even more efficient by not including the OS and Node version in the cache key as well - #272 (comment)

This got a bit more complicated in Yarn v3.1.0 with the introduction of supportedArchitectures, if current is included in any of the options (the default) then the amount of packages fetched by Yarn can be different based on the host. For example by default fsevents will be fetched on a macOS runner but not Ubuntu.

@nijikon
Copy link

nijikon commented Apr 5, 2023

Is there any plan to add this anytime soon?

@marko-zivic-93
Copy link
Contributor

Hello @nijikon,
We have great news for you. We will be taking this feature and working on it in the upcoming quarter, so you can expect this to be available soon... 😃

@dsame
Copy link
Contributor

dsame commented Apr 21, 2023

Hello @belgattitude, please confirm the problem we want to solve is eliminating the whole cache reset on any lock file invalidating.

I mean that if only one (of 2) dependency changed akv-demo/setup-node-test@44f68b4#diff-51e4f558fae534656963876761c95b83b6ef5da5103c4adef6768219ed76c2deL19

the whole cache is reset https://github.com/akv-demo/setup-node-test/actions/runs/4767849996/jobs/8476525793#step:3:82

and all (of 2) dependencies must be fetched https://github.com/akv-demo/setup-node-test/actions/runs/4767849996/jobs/8476525793#step:4:28

Is your suggestion to solve the described problem with restore-keys-like behaviour?

@belgattitude
Copy link
Author

Hey @dsame

please confirm the problem we want to solve is eliminating the whole cache reset on any lock file invalidating.

Yes.

But https://github.com/akv-demo/setup-node-test/actions/runs/4767849996/jobs/8476525793#step:4:28 does not seem to work ? Am I wrong ?

@belgattitude
Copy link
Author

belgattitude commented May 1, 2023

BTW you can have a look to some ideas

That said in my experience couldn't find one solution for every use-case. Comments in there strapi/strapi#16581

TLDR:

Ideally with yarn, the following steps/files/folders can be cached

image

  • (1) required: yarn cache .yarn/cache.zip (reusable on lock changes)
  • (2) optional: install_state (invalided on lock/os/node-version change)
  • (3) optional: node_modules/** (invalided on lock/os/node-version change)

For 2 and 3 it's optional and should be measured on a specific repo. Cause saving node_modules is heavy (see post restore).

For a concrete example see also a small sized monorepo but with a lot of deps: https://github.com/belgattitude/nextjs-monorepo-example. It seems doable to get an install under 20s

image

There's more to say about supportedArchitectures (yarn 4), prisma postinstall tricks...

@dsame
Copy link
Contributor

dsame commented May 3, 2023

Hey @dsame

please confirm the problem we want to solve is eliminating the whole cache reset on any lock file invalidating.

Yes.

But https://github.com/akv-demo/setup-node-test/actions/runs/4767849996/jobs/8476525793#step:4:28 does not seem to work ? Am I wrong ?

Yes, it is a proof the problem exists build - pay attention it uses v3(i.e. current) action, this way i confirmed we really need the feature.

@dsame
Copy link
Contributor

dsame commented May 3, 2023

@belgattitude with the close look i started to doubt about the feature.

First of all yarn has 2 modes - local and global cache depending on enableGlobalCache option in .yarnrc.yml

With the global cache yarn does not purge the obsolete dependencies and we can not restore the previous cache in order to avoid the endless grow

proof: you can see different versions of lodash

with the local cache yarn does purge the obsolete dependencies BUT, in this case you should not use actions cache at all - you should keep .yarn in your repo instead as [it is recommended in the yarn docs]

(https://yarnpkg.com/features/offline-cache).

If we even apply the feature only for local caches we are getting into the danger of clashing .yarn from cache with .yarn from repo. It is still possible to workaround this but it looks to be useful only for marginal case: local cache + .yarn not included to the repo

Can you please provide more justification to use caching instead of adding .yarn to the repo?

Also i have a question about node modules? Do we really need it to cache it with yarn? Since PnP has been introduced yarn does not have it at all, but have .pnp.cjs instead, which is assumed to be added to the repo as well.

@belgattitude
Copy link
Author

From my phone, so writing is minimal.

Override global cache with YARN_ENABLE_GLOBAL_CACHE=false. That gives you an uniform way to do it for ci. The yarn cache will be stored in .yarn/cache by default but you can get it from config as well.

Just cache .yarn/cache not .yarn

Better to avoid caching node modules with this approach. It generally does not bring speed up due to action/cache extra work. That also solve the pnp support.

@dsame
Copy link
Contributor

dsame commented May 5, 2023

Hello @belgattitude - one more clarification: is there any reason to do not add .yarn/cache to the repository as it is recommended in the yarn docs and how the yarn team do for their own repo ?

Having yarn/cache saves the Git storage and gives better performance than caching, but i suspect there may be some specific reason to use cache instead of adding to the repo?

@belgattitude
Copy link
Author

belgattitude commented May 5, 2023

The initial offline mode idea isn't probably used. Storing large binaries in git and their mutation is probably not a good idea. I guess after few updates you'll end up with a git repo of 10's of Gb.
Git isn't a cache for large binaries. Rewritting history to purge obsolete entries isn't easy.

That said I can see why they explored this. Yarn default is to not commit the cache (doc is confusing).

But it's true setup/node action cache feature shouldn't be enabled by the user if he's using full offline.

@dsame
Copy link
Contributor

dsame commented May 5, 2023

Thank you @belgattitude for your opinion, now i agree it is worth to continue with the feature in order to avoid huge repo.

Can you please take a look at this Proof of concept draft PR and tell you opinion about does it make a sense to include node version and step id into the cache key keeping in mind having many huge cache will hit storage limit soon

https://github.com/actions/setup-node/pull/744/files#diff-55f15e2366942ad15f71a41ac983f8ce9a9882b28b7fd9082f3a26c799783064R42

@dsame
Copy link
Contributor

dsame commented May 5, 2023

@belgattitude can you please also confirm the following :

We have to use 3rd parameter of restoreCache in case if all of 3 conditions met:

The 3rd parameter of restoreCache has to be set to not null in case if all of 3 conditions met:

  1. package manager is yarn
  2. enableGlobalCache is omitted or false
  3. there's no .yarn/cache directory in the working dir

@belgattitude
Copy link
Author

belgattitude commented May 6, 2023

@dsame

  1. only from v2 (v1 behaviour should fallback to current way of doing)
  2. (mmmh). on CI this should be overriden to enableGlobalCache: false through env. Don't run with true on CI (see previous comment in Performance opportunity for yarn 3 cache #325 (comment))
  3. yes

I have looked into your PR. stepId not necessary in my opinion. But there's something missing to ensure yarn 4. Not a lot of time the next days... I'll figure out a way to share info asap

@belgattitude
Copy link
Author

belgattitude commented May 6, 2023

In the meantime I've tried to answers few questions in

Might worth to have a quick glance

@dsame
Copy link
Contributor

dsame commented May 9, 2023

The next iteration of PoC PR showed the detection of all the conditions implies the significant changes in code base and makes it unreasonable complicated.

@belgattitude can you please take a look at this issue - does not it seem the goal of reusing the existing cache can be achieved with resolving that issue in the more straightforward and common way?

I looked through the links you've sent,

  • according to the key-prefix idea, it is clear but you've provided a custom solution for the specific build while we are trying to provide universal solution for the whole action - this is why it become complicated
  • according to including .yarnec.yaml in hash calculation, it is the overkill and cause unnecessary cache refreshes
  • according to node_modules caching it brings performance gain only for cases of some binaries have to be rebuilt from sources and it is not considered now to be effective enough for most users - downloading and unpacking lot of files can be event slower than npm install ... if the dependencies are already cached.

Summary: can you please confirm or deny that key-prefix input added to the setup-node cache can boost the yarn3 performance and achieve the same goal as auto-generating the same key-prefix if the conditions are met?

@belgattitude
Copy link
Author

belgattitude commented May 10, 2023

Sorry I couldn't read properly but some quick toughts:

an you please take a look at this actions/setup-go#358 - does not it seem the goal of reusing the existing cache can be achieved with resolving that issue in the more straightforward and common way?

I would agree that params helps.

ccording to including .yarnec.yaml in hash calculation, it is the overkill and cause unnecessary cache refreshes

yarnrc.yml isn't something that is mutated often, so I wouldn't worry about it to invalidate things often.

Not accounting for it looks fragile in my impression, but I'm not 100% sure. I would let the benefit to doubt and include it in the hash (ie someone changes an advanced param...)

What do you mean by overkill ? is the hash call slow ?

according to node_modules caching it brings performance gain only for cases of some binaries have to be rebuilt from sources and it is not considered now to be effective enough for most users - downloading and unpacking lot of files can be event slower than npm install ... if the dependencies are already cached

For yarn 2+, I would not cache node_modules at all. (only few edges case would benefit from it, in this case they might use a totally different action than the one in setup/node)

Can you please confirm or deny that key-prefix input added to the setup-node cache can boost the yarn3 performance and achieve the same goal as auto-generating the same key-prefix if #325 (comment)?

The general idea seems good, but as always a deep testing is required. Do you expect me to do it ? In that case I'll need some time.

showed the detection of #325 (comment) implies the significant changes in code base and makes it unreasonable complicated.

Can you precise the meaning of ? "Unreasonable" => "in your opinion you wouldnt include the feature ?". Or you're still wondering ? I would understand of course.

But as a small personal note, it might have an very nice impact on global CI time... . Not sure about how many projects uses yarn 2+ and setup/node cache, but that's an interesting feature imho

@dsame
Copy link
Contributor

dsame commented May 10, 2023

Hello, @belgattitude

The general idea seems good, but as always a deep testing is required. Do you expect me to do it ?

Not at all, i've requested you opinion just to confirm we are no the same page and to get rid off some my hesitations. Now there's a plan an we are on the road. Thanks for the cooperation.

@nijikon
Copy link

nijikon commented Jul 18, 2023

@marko-zivic-93 back in April #325 (comment), you wrote that you will be having a look at this in the upcoming quarter. Were you talking about Q2 or Q3 of 2023?

@dsame
Copy link
Contributor

dsame commented Jul 19, 2023

Hello @nijikon the requested changes are merged into the main branch and are available since v3.7.0 release

@nijikon
Copy link

nijikon commented Jul 19, 2023

@dsame thanks. I will have a look if that helps my case.

@dsame
Copy link
Contributor

dsame commented Jul 25, 2023

Hello @nijikon, i am going to close this issue because the PRs are merged and due to inactivity, but please feel free to reopen it or create new issue in case if the problem still exists

@dsame dsame closed this as completed Jul 25, 2023
@nijikon
Copy link

nijikon commented Jul 25, 2023

@dsame I think I'm good. My current problem is that it does not cache the node_modules folder, but I read that it's expected.

deining pushed a commit to deining/setup-node that referenced this issue Nov 9, 2023
Bumps [@types/js-yaml](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/js-yaml) from 4.0.4 to 4.0.5.
- [Release notes](https://github.com/DefinitelyTyped/DefinitelyTyped/releases)
- [Commits](https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/js-yaml)

---
updated-dependencies:
- dependency-name: "@types/js-yaml"
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request to improve the current logic
Projects
None yet
Development

No branches or pull requests