Status update: lifting the unaligned GPU matmul codegen boats #13227
                  
                    
                      nicolasvasilache
                    
                  
                
                  started this conversation in
                Codegen
              
            Replies: 1 comment
-
| Exciting results! I can create new benchmarks with this flag enabled once #13133 is merged. | 
Beta Was this translation helpful? Give feedback.
                  
                    0 replies
                  
                
            
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I wanted to share a quick update on unaligned matmul codegen for tensorcore-based GPUs before disappearing for 2 weeks.
Below are performance gains that become available (once #13133 lands) by turning on the
--iree-codegen-llvmgpu-enable-transform-dialect-matmul-tensorcore-strategyflag (5-40x improvement over the current IREE unaligned cases).This can be reproduced today by just patching #13191 (which extracts the key change required from #13133) and running
make unaligned_matmulswith this iree-samples commit.This runs a few combinations of align1/align2/align4/align_more around the 3456_1024_2048 size, f32 only for now.
Feel free to try other sizes.
Now, we are still 2-4x off where we want to be and there is still work to do around some of the low-level aspects:
128x128x16x3xwmmaIf people feel bold, they could try to turn the flag on by default to get the first 5-40x perf gains.
I'll pick this up again in 2 weeks.
@silvasean @mariecwhite @mattwalsh @stellaraccident @ftynse
Beta Was this translation helpful? Give feedback.
All reactions