Skip to content

Conversation

@IvanYashchuk
Copy link
Collaborator

@IvanYashchuk IvanYashchuk commented Mar 12, 2025

Replaces kwarg construction with a single attribute access on the fd object. This doesn't improve latency only simplifies the code of the call method.

There are currently 7 attribute accesses in the FusionDefinitionWrapper.__call__ implementation. A few nanoseconds speed up for each of these accesses is achieved by using __slots__/slots=True.

cc @tfogal

@IvanYashchuk IvanYashchuk marked this pull request as draft March 12, 2025 09:51
@IvanYashchuk IvanYashchuk marked this pull request as ready for review March 12, 2025 11:21
Copy link
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice cleanup!

fyi @t-vi, the failing test is:

FAILED thunder/tests/test_ops.py::test_core_vs_torch_consistency_rms_norm_torch_cpu_thunder.dtypes.bfloat16 - AssertionError: Tensor-likes are not close!

which is not related to this PR and should be fixed by #1868

@IvanYashchuk IvanYashchuk enabled auto-merge (squash) April 17, 2025 09:59
@IvanYashchuk IvanYashchuk merged commit db7e769 into main Jun 5, 2025
50 checks passed
@IvanYashchuk IvanYashchuk deleted the nvfuser-wrapper-call-device branch June 5, 2025 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants