Skip to content

Conversation

@chengjunlu
Copy link
Contributor

To improve the block IO 2D load lowering for the case that maskConstancyVer < tileHeight.
Move the register bases of the adjusted part of tile height to the place after the vBlocks.
That we can easily adjust the size of the tile height of the block io.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves the block IO 2D load lowering in the LoadStoreOpToLLVM pass to handle cases where maskConstancyVer < tileHeight. The key improvement is a register base rearrangement strategy that rotates register bases of the adjusted tile height portion to appear after vBlocks, enabling easier adjustment of tile height for block IO operations.

Key Changes:

  • Enhanced mask constancy validation to handle vertical mask constancy less than tile height
  • Added register base rearrangement logic using std::rotate to optimize memory layout
  • Added MSVC compatibility for __builtin_clz and __builtin_ctz intrinsics

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp Core implementation: Added MSVC intrinsic compatibility, enhanced mask validation, and register rearrangement logic for tile height adjustment
test/TritonIntelGPU/tensor-pointer-load-block-2d.mlir Added comprehensive test cases for different DPAS configurations and repCluster patterns to validate the improved lowering

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@chengjunlu chengjunlu force-pushed the chengjun/adjust_block_io_size_with_mask_tile branch 3 times, most recently from 058131c to b1be507 Compare November 3, 2025 07:29
@chengjunlu chengjunlu force-pushed the chengjun/adjust_block_io_size_with_mask_tile branch from b1be507 to fe492f8 Compare November 4, 2025 03:10
@chengjunlu chengjunlu requested a review from Copilot November 4, 2025 03:10
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@chengjunlu chengjunlu force-pushed the chengjun/adjust_block_io_size_with_mask_tile branch from fe492f8 to 469dde9 Compare November 4, 2025 03:15
@chengjunlu chengjunlu requested a review from Copilot November 4, 2025 03:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@chengjunlu chengjunlu enabled auto-merge (squash) November 4, 2025 03:20
@chengjunlu chengjunlu merged commit da48f1a into main Nov 4, 2025
23 checks passed
@chengjunlu chengjunlu deleted the chengjun/adjust_block_io_size_with_mask_tile branch November 4, 2025 04:36
…e that maskConstancyVer < tileHeight.

Signed-off-by: Lu,Chengjun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants