Skip to content

Conversation

@rdspring1
Copy link
Collaborator

This PR enables an LRU Fusion Cache decorator for direct bindings to address issue #2700.

Legacy Fusion Cache
iter 0: loss 0.1641, iter time: 102852.81ms, t: 4096
iter 1: loss 0.1196, iter time: 20309.14ms, t: 4096
iter 2: loss 0.0771, iter time: 19994.30ms, t: 4096

ToT Direct w/o LRU Cache
iter 0: loss 0.1641, iter time: 204575.74ms, t: 4096
iter 1: loss 0.1196, iter time: 22914.21ms, t: 4096
iter 2: loss 0.0771, iter time: 20006.35ms, t: 4096

Direct with LRU Cache
iter 0: loss 0.1641, iter time: 115281.74ms, t: 4096
iter 1: loss 0.1196, iter time: 20228.57ms, t: 4096
iter 2: loss 0.0771, iter time: 19998.50ms, t: 4096
  • LRU Cache is 12% slower because equality check was updated to cover block
    scale operations, which create non-trivial allocation domains.

@rdspring1 rdspring1 marked this pull request as ready for review December 17, 2025 16:58
Copy link
Collaborator

@kshitij12345 kshitij12345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @rdspring1

# by nvfuserex.py when nvFuser is available.

DIRECT_BINDINGS_SUPPORTED_VERSION = LooseVersion("0.2.34")
DIRECT_BINDINGS_SUPPORTED_VERSION = LooseVersion("0.2.35")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the minimum version which ships with LruFusionCache?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

return func
from nvfuser_direct import LruFusionCache

return LruFusionCache(max_fusions=16384)(func)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default value for max_fusions is 16384, we can remove this here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq: will there be an option of setting cache size? Or it would not be that useful

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually we pick a reasonable number to avoid out-of-memory issues. We never thought to change it at runtime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants