Fix bugs in DynamicCache #37880

tugsbayasgalan · 2025-04-30T04:14:40Z

What does this PR do?

When we flatten DynamicCache for export, we never end up flattening the inner tensors of DynamicCache because when we start, there are 0 tensors initialized. As a result, we didn't correctly test the ep.module()(*args, **kwargs) behaviour when we do export when cache is populated.

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

github-actions · 2025-04-30T04:14:52Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

Rocketknight1 · 2025-04-30T11:54:13Z

cc @gante

gante

Thank you for the PR 🤗 In general, LGTM (I'm not super keen in increasing the complexity in DynamicCache, but I understand the importance of the fix)

Missing: update docstring with the new optional arg

gante · 2025-04-30T15:18:27Z

src/transformers/cache_utils.py

@@ -359,11 +359,15 @@ class DynamicCache(Cache):
        ```
    """

-    def __init__(self, _distributed_cache_data: Optional[Iterable] = None) -> None:
+    def __init__(self, _distributed_cache_data: Optional[Iterable] = None, num_layers: Optional[int] = None) -> None:


Let's accept config instead of num_layers (=config.num_layers). It's more consistent with the other caches, which also take config in __init__.

ArthurZucker

thanks! Not sure we need a new argument here!

ArthurZucker · 2025-05-01T14:49:16Z

src/transformers/cache_utils.py

+            self.key_cache = [torch.tensor([]) for _ in range(num_layers)]
+            self.value_cache = [torch.tensor([]) for _ in range(num_layers)]


why don't we always init like this?

We need to know how many layers we want to do this for.

DynamicCache has lazy tensor init, and export needs eager tensor init :D

It's similar to the issue we have with TP (should be lazy) vs torch.compile (should be eager) in the hybrid caches

gante

One more detail and it's good for me 👍

gante · 2025-05-22T09:25:08Z

src/transformers/cache_utils.py

@@ -359,11 +359,17 @@ class DynamicCache(Cache):
        ```
    """

-    def __init__(self, _distributed_cache_data: Optional[Iterable] = None) -> None:
+    def __init__(
+        self, _distributed_cache_data: Optional[Iterable] = None, config: Optional[PretrainedConfig] = None


missing: docs for config in the docstring above, explaining when it should be used (torch.export)

(sorry, I missed this detail in the previous review :D)

tugsbayasgalan · 2025-06-09T20:30:04Z

@ArthurZucker @gante,

I originally hoped to make DynamicCache torch.export compatible with dynamic shapes. But this seems quite difficult and seems outside of scope for export since the caching code is not really the model's forward pass. To make it work,

We need to pass extra config parameter so that we prefill key_cache and value_cache with dummy shapes
We need torch.cond to switch between the initila value and the populated value.

Both of the above will make transformers code quite ugly. And in export, we are working on exporting submodules with different input specs, so i don't feel it is that important to make DynamicShapes fully seamless with export at the cost of code complexity. Our current suggestion would be to get two graphs:

You should export without cache first to populate the cache entries
You should export after populating the cache with dynamic shapes.

This PR still fixes the bug where we weren't able to run the exported artifact when dynamic shapes are used.

cc: @xadupre @zhxchen17

Cyrilvallez

Hey! I think we can probably make the test a bit cleaner, then let's go! 🤗🚀

Cyrilvallez · 2025-06-13T08:07:59Z

tests/utils/test_cache_utils.py


+    def test_dynamic_cache_exportability_dynamic_cache(self):
+        model = AutoModelForCausalLM.from_pretrained("hf-internal-testing/tiny-random-MistralForCausalLM")


Is it an extension of test_dynamic_cache_exportability, or a new test that should be independent? If an extension, let's simply add the new parts to the existing test, otherwise let's have a better name for this new test! 🤗

Cyrilvallez

Hey @tugsbayasgalan! The new test you added does not pass (see the CI report below the PR), so it would need to be fixed before merging!

Cyrilvallez · 2025-06-16T07:56:05Z

tests/utils/test_cache_utils.py

+    @slow
+    @require_read_token


It should not need these decorators, does it?

nah it was just copy pasta. Deleted

Cyrilvallez · 2025-06-20T09:08:21Z

We just need to fix the small conflict based on our new ruff rules, then it's good to go!

github-actions bot marked this pull request as draft April 30, 2025 04:14

tugsbayasgalan marked this pull request as ready for review April 30, 2025 15:06

github-actions bot requested review from ArthurZucker and Rocketknight1 April 30, 2025 15:06

gante reviewed Apr 30, 2025

View reviewed changes

ArthurZucker reviewed May 1, 2025

View reviewed changes

gante reviewed May 22, 2025

View reviewed changes

tugsbayasgalan force-pushed the dynamic_cache_v2 branch from d5ee85f to fabfd80 Compare June 9, 2025 20:15

tugsbayasgalan requested review from gante and ArthurZucker June 9, 2025 20:31

Cyrilvallez reviewed Jun 13, 2025

View reviewed changes

tugsbayasgalan requested a review from Cyrilvallez June 13, 2025 15:36

Cyrilvallez reviewed Jun 16, 2025

View reviewed changes

tugsbayasgalan added 7 commits June 23, 2025 08:27

Fix bugs in DynamicCache

fca3226

Updarte

03696f7

Update

532ec57

Lint

28f49bc

lint

38d2b58

Rename test

53a866c

update

54dc95a

tugsbayasgalan force-pushed the dynamic_cache_v2 branch from 0d89988 to 54dc95a Compare June 23, 2025 15:50

tugsbayasgalan requested a review from Cyrilvallez June 23, 2025 16:03

Merge branch 'main' into dynamic_cache_v2

f530775

		self.key_cache = [torch.tensor([]) for _ in range(num_layers)]
		self.value_cache = [torch.tensor([]) for _ in range(num_layers)]


		def test_dynamic_cache_exportability_dynamic_cache(self):
		model = AutoModelForCausalLM.from_pretrained("hf-internal-testing/tiny-random-MistralForCausalLM")

Fix bugs in DynamicCache #37880

Are you sure you want to change the base?

Fix bugs in DynamicCache #37880

Conversation

tugsbayasgalan commented Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

github-actions bot commented Apr 30, 2025

Uh oh!

Rocketknight1 commented Apr 30, 2025

Uh oh!

gante left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gante May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gante left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tugsbayasgalan commented Jun 9, 2025

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez commented Jun 20, 2025

Uh oh!

Uh oh!

tugsbayasgalan commented Apr 30, 2025 •

edited

Loading

gante left a comment •

edited

Loading

gante May 2, 2025 •

edited

Loading