Skip to content

[XLA:GPU] Do not multi-output fuse sibling transposes with reductions. #28786

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 15, 2025

Conversation

copybara-service[bot]
Copy link

[XLA:GPU] Do not multi-output fuse sibling transposes with reductions.

A elemental (aka nested) transpose has a different read pattern than an unnested transpose or reduction, because the emitters for tiled transposes and parallel reductions ensure uniform read patterns.
A multi-output fusion with roots that have different read patterns is generally not profitable. Input data will be read multiple times by different threads, which defeats the purpose of multi-output fusion. What is worse, increased register pressure can impact performance.

@copybara-service copybara-service bot force-pushed the test_781870005 branch 2 times, most recently from 0692bc2 to 7048b9e Compare July 15, 2025 07:07
A elemental (aka nested) transpose has a different read pattern than an unnested transpose or reduction, because the emitters for tiled transposes and parallel reductions ensure uniform read patterns.
A multi-output fusion with roots that have different read patterns is generally not profitable. Input data will be read multiple times by different threads, which defeats the purpose of multi-output fusion. What is worse, increased register pressure can impact performance.

PiperOrigin-RevId: 783214366
@copybara-service copybara-service bot merged commit c7f65c9 into main Jul 15, 2025
@copybara-service copybara-service bot deleted the test_781870005 branch July 15, 2025 07:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant