Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When using file storage for the lock provider, we get an ever growing directory of files without any cleanup happening #39369

Open
5 tasks
hostep opened this issue Nov 14, 2024 · 7 comments · May be fixed by #39372
Assignees
Labels
Area: Framework Component: Backend Issue: Confirmed Gate 3 Passed. Manual verification of the issue completed. Issue is confirmed Priority: P2 A defect with this priority could have functionality issues which are not to expectations. Progress: PR in progress Reported on 2.4.x Indicates original Magento version for the Issue report. Reproduced on 2.4.x The issue has been reproduced on latest 2.4-develop branch Triage: Dev.Experience Issue related to Developer Experience and needs help with Triage to Confirm or Reject it

Comments

@hostep
Copy link
Contributor

hostep commented Nov 14, 2024

Preconditions and environment

  • Magento version 2.4.6 / 2.4.7 / 2.4-develop
  • Having the Magento lock provider setup to use file storage, see steps to reproduce

Steps to reproduce

  1. Setup clean Magento with sample data
  2. Make sure cronjobs are running
  3. From the root of Magento, execute these commands:
$ mkdir "$(pwd)/var/locks"
$ bin/magento setup:config:set --lock-provider=file --lock-file-path="$(pwd)/var/locks"
$ bin/magento cache:enable
$ bin/magento cache:flush
  1. Start visiting the frontend, please visit many different pages: homepage, category pages, product detail pages, ...
  2. Also create some orders
  3. Look at the contents of the var/locks directory

Ideally you should look at a shop running for many months and where file based locking has been enabled, and inspect the directory that contains those lock files.

Expected result

  • var/locks directory should not overflow with an ever increasing amount of files

Actual result

  • var/locks can contain up to millions of files after several months of running a Magento shop

Example filenames:

- BLOCK_833608638be7d42c9a8edf79b3516f6768babb37b07cb0c924ce1ede8c7b60f4-122217-final_price-EUR-20241114-1-0-
- BLOCK_8c40abe5c67c542dd31f6ed0787db94f9b6c7add8c7eccb4aeff4dc1adb10d9d-122217
- BLOCK_833608638be7d42c9a8edf79b3516f6768babb37b07cb0c924ce1ede8c7b60f4-122209-final_price-EUR-20241114-1-0-
- BLOCK_8c40abe5c67c542dd31f6ed0787db94f9b6c7add8c7eccb4aeff4dc1adb10d9d-122209
- BLOCK_833608638be7d42c9a8edf79b3516f6768babb37b07cb0c924ce1ede8c7b60f4-122213-final_price-EUR-20241114-1-0-
- BLOCK_8c40abe5c67c542dd31f6ed0787db94f9b6c7add8c7eccb4aeff4dc1adb10d9d-122213
- BLOCK_833608638be7d42c9a8edf79b3516f6768babb37b07cb0c924ce1ede8c7b60f4-122194-final_price-EUR-20241114-1-0-
- BLOCK_0dececc6f6ab51b0dc125a1d40b7c4fd8dac44300acc9927a9bbc923f0dc4b1d-122201-final_price-EUR-20241114-1-0-
- BLOCK_fd38c4b3f09eb9f6357edad8fbb6fa37f447f4b29367471c1f63330cfd2e6041
- BLOCK_2b0ceeddda2b7376874129eed2893a53db19081fd2ff426fc9a63a3ad80125d3
- PLACE_ORDER_20194
- PLACE_ORDER_20197
- PLACE_ORDER_20199
- PLACE_ORDER_20201
- CRON_54ca823af2ab2bc8e0923548a617e1d9
- CRON_597b39571a8e2e02aa3da5b0b669058c
- CRON_5d6e389185b7c7ff8ada1b70e5d5beec
- CRON_5dc531608e6750f85d979bb963298bd3
- CRON_65418d1532274c42fbc28425e8b9b7f1
- CRON_65541ac6ef52e06c546be79471cf53ce
- indexer_lock_category_product
- SYSTEM_CONFIG

We had one project that had 3.5 million files in that directory, up to a point where the filesystem started complaining with errors like Failed to open stream: No space left on device every time a lock file was needed (there was plenty of disk space, it's just that a directory can't handle that large amount of files in one directory). Another project had 1.9 million files and another had ~170.000 files.

So that's really bad. We need some kind of bugfix or cleanup mechanism to keep this under control.

Additional information

The locking mechanism in Magento is used for various things:

  • to prevent cronjobs from running simultaneously
  • to prevent indexers from indexing simultaneously
  • to prevent queue consumers from running simultaneously
  • to prevent generating Blocks simultaneously when multiple people are visiting the frontend of your webshop at the same time
  • when an order is being made, your quote/cart is being locked for some reason
  • ...

Most of these things result in lock names that are predictable and remain the same over time:

  • cronjob names, crongroup names
  • indexer names
  • queue consumer names

However, some of these are pretty dynamic and those names can change over time:

  • when blocks are generated, for each of them the method getCacheKeyInfo is called and that gets hashed and that hash is being used for the lock name, sometimes with a suffix for the final_price blocks that contain ID's and dates and currencies and so on
  • when quote/cart is being locked, the ID of the quote is being used in the lock name

More observations:

  • This problem got amplified a lot in Magento 2.4.7, because before 2.4.7 the locking provider was forced to use the database for when Blocks were being generated through the LockGuardedCacheLoader, the commit that removed this in 2.4.7 is: 65d8985
  • The quote/cart locking was introduced in Magento 2.4.6 with 074f08b and will get replaced in Magento 2.4.8 with a CartMutex: 5efbcd8, but it will have similar problems
  • One of the reasons why the amount of lock files increase rapidly on our servers, is because we use a deploy strategy where each deploy ends up in a different directory on the server, fe, one deploy in /var/www/html/release123/, the next one in /var/www/html/release124/, and it turns out that one of the parameters to generate a lock name contains the absolute path to template files, so for each deploy that path changes and the lock name changes as well, so we get an ever growing, never stopping, number of lock files after each deploy

Possible solutions:

  • For the last observeration, instead of using absolute paths, we could choose to use relative paths, which should result in more stable lock names, I already came up with this solution, but I have no idea if there will be regressions caused by this, because this data is not used solely for generating lock names:
diff --git a/app/code/Magento/Theme/Block/Html/Footer.php b/app/code/Magento/Theme/Block/Html/Footer.php
index 672e176b5da..5e8eafb9617 100644
--- a/app/code/Magento/Theme/Block/Html/Footer.php
+++ b/app/code/Magento/Theme/Block/Html/Footer.php
@@ -75,7 +75,7 @@ class Footer extends \Magento\Framework\View\Element\Template implements \Magent
             (int)$this->_storeManager->getStore()->isCurrentlySecure(),
             $this->_design->getDesignTheme()->getId(),
             $this->httpContext->getValue(Context::CONTEXT_AUTH),
-            $this->getTemplateFile(),
+            $this->getRelativeTemplateFile(),
             'template' => $this->getTemplate()
         ];
     }
diff --git a/lib/internal/Magento/Framework/View/Element/Template.php b/lib/internal/Magento/Framework/View/Element/Template.php
index e68d059c079..116452b4d8c 100644
--- a/lib/internal/Magento/Framework/View/Element/Template.php
+++ b/lib/internal/Magento/Framework/View/Element/Template.php
@@ -328,12 +328,17 @@ class Template extends AbstractBlock
         return [
             'BLOCK_TPL',
             $this->_storeManager->getStore()->getCode(),
-            $this->getTemplateFile(),
+            $this->getRelativeTemplateFile(),
             'base_url' => $this->getBaseUrl(),
             'template' => $this->getTemplate()
         ];
     }

+    protected function getRelativeTemplateFile(): string
+    {
+        return $this->getRootDirectory()->getRelativePath($this->getTemplateFile());
+    }
+
     /**
      * Instantiates filesystem directory
      *
  • Maybe for the block generating locks through LockGuardedCacheLoader, we can mark those as temporary (if file storage is used) and to be removed immediately after they have been unlocked. No idea if this is a good idea and won't lead to race conditions or other problems
  • Maybe a new configuration setting in env.php should be added to determine what lock storage we should use for the LockGuardedCacheLoader, Adobe specifially disabled the database one in 2.4.7, for Galera clusters as they have issues with this for some reason, but if this would be configurable and we could choose ourselves what locking mechanism should be used for the LockGuardedCacheLoader, then we can already prevent a big deal of lock files being written to the filesystem
  • And then the obvious one: write a cronjob that runs every hour or so that looks at the lock files (if file storage is used) and finds the ones that are not locked and haven't been touched in 24 hours or so and remove those

These are just some random ideas, in case other people have other suggestions, feel free to let me know in the comments below.

Temporary solution:
Setup something like this in the crontab on your server:

# removes lock files that haven't been touched in 10 days
15 0 * * * find /path/to/your/lock/file/storage -type f -mtime +10 -delete

Thanks!

Release note

No response

Triage and priority

  • Severity: S0 - Affects critical data or functionality and leaves users without workaround.
  • Severity: S1 - Affects critical data or functionality and forces users to employ a workaround.
  • Severity: S2 - Affects non-critical data or functionality and forces users to employ a workaround.
  • Severity: S3 - Affects non-critical data or functionality and does not force users to employ a workaround.
  • Severity: S4 - Affects aesthetics, professional look and feel, “quality” or “usability”.
Copy link

m2-assistant bot commented Nov 14, 2024

Hi @hostep. Thank you for your report.
To speed up processing of this issue, make sure that the issue is reproducible on the vanilla Magento instance following Steps to reproduce.


Join Magento Community Engineering Slack and ask your questions in #github channel.
⚠️ According to the Magento Contribution requirements, all issues must go through the Community Contributions Triage process. Community Contributions Triage is a public meeting.
🕙 You can find the schedule on the Magento Community Calendar page.
📞 The triage of issues happens in the queue order. If you want to speed up the delivery of your contribution, join the Community Contributions Triage session to discuss the appropriate ticket.

@hostep
Copy link
Contributor Author

hostep commented Nov 15, 2024

Performed some statistics calculations across the projects we maintain ourselves.

These are the number of lock files found across the production versions of these shops, per Magento version. All our shops on Magento 2.4.7 got the temporary workaround to delete lock files if they haven't been touched in 10 days. Without this workaround the numbers would exceed 1 million.

Average:
- Magento 2.3.7 => 373
- Magento 2.4.5 => 129
- Magento 2.4.6 => 14144
- Magento 2.4.7 => 87441  (without temporary cleanup workaround it would be ~1.856.666)

Median:
- Magento 2.3.7 => 75
- Magento 2.4.5 => 129
- Magento 2.4.6 => 1642
- Magento 2.4.7 => 87450  (without temporary cleanup workaround it would be ~1.900.000)

So we can indeed see a slight increase of number of lock files used since Magento 2.4.6 which can be explained in a lock file per order being needed now.
And a huge increase since Magento 2.4.7, since generating cache for Blocks need a lock file per Block now, instead of using the database for those locks.

@hostep
Copy link
Contributor Author

hostep commented Nov 15, 2024

Suggestion for the simple solution, to cleanup old lock files using a cronjob: #39372

This is probably only a partial fix for this issue, it's a cleanup afterwards, maybe we can also already beforehand try to prevent writing too many files somehow...

@engcom-Bravo engcom-Bravo added the Triage: Dev.Experience Issue related to Developer Experience and needs help with Triage to Confirm or Reject it label Nov 18, 2024
@engcom-Hotel engcom-Hotel self-assigned this Nov 18, 2024
Copy link

m2-assistant bot commented Nov 18, 2024

Hi @engcom-Hotel. Thank you for working on this issue.
In order to make sure that issue has enough information and ready for development, please read and check the following instruction: 👇

  • 1. Verify that issue has all the required information. (Preconditions, Steps to reproduce, Expected result, Actual result).
  • 2. Verify that issue has a meaningful description and provides enough information to reproduce the issue.
  • 3. Add Area: XXXXX label to the ticket, indicating the functional areas it may be related to.
  • 4. Verify that the issue is reproducible on 2.4-develop branch
    Details- If the issue is reproducible on 2.4-develop branch, please, add the label Reproduced on 2.4.x.
    - If the issue is not reproducible, add your comment that issue is not reproducible and close the issue and stop verification process here!
  • 5. Add label Issue: Confirmed once verification is complete.
  • 6. Make sure that automatic system confirms that report has been added to the backlog.

@engcom-Hotel
Copy link
Contributor

Hello @hostep,

Thanks for the report and collaboration!

The issue has been reproducible for us. Hence confirming this issue.

Thanks

@engcom-Hotel engcom-Hotel added Issue: Confirmed Gate 3 Passed. Manual verification of the issue completed. Issue is confirmed Component: Backend Reproduced on 2.4.x The issue has been reproduced on latest 2.4-develop branch Priority: P2 A defect with this priority could have functionality issues which are not to expectations. Area: Framework and removed Issue: ready for confirmation labels Nov 18, 2024
@github-jira-sync-bot
Copy link

✅ Jira issue https://jira.corp.adobe.com/browse/AC-13367 is successfully created for this GitHub issue.

Copy link

m2-assistant bot commented Nov 18, 2024

✅ Confirmed by @engcom-Hotel. Thank you for verifying the issue.
Issue Available: @engcom-Hotel, You will be automatically unassigned. Contributors/Maintainers can claim this issue to continue. To reclaim and continue work, reassign the ticket to yourself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Framework Component: Backend Issue: Confirmed Gate 3 Passed. Manual verification of the issue completed. Issue is confirmed Priority: P2 A defect with this priority could have functionality issues which are not to expectations. Progress: PR in progress Reported on 2.4.x Indicates original Magento version for the Issue report. Reproduced on 2.4.x The issue has been reproduced on latest 2.4-develop branch Triage: Dev.Experience Issue related to Developer Experience and needs help with Triage to Confirm or Reject it
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants