Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] wmma should be enabled w/ clang. #2006

Open
Artem-B opened this issue Dec 20, 2024 · 1 comment
Open

[BUG] wmma should be enabled w/ clang. #2006

Artem-B opened this issue Dec 20, 2024 · 1 comment
Labels
? - Needs Triage bug Something isn't working

Comments

@Artem-B
Copy link
Contributor

Artem-B commented Dec 20, 2024

Describe the bug
cutlass currently disables WMMA instructions when compiled with clang.

// CUTLASS WMMA does not support clang at present.
#if !(defined(__clang__) && defined(__CUDA__))

The comment is no longer valid and the tests work fine with the condition above removed.

Steps/Code to reproduce bug
Build test/unit/gemm/warp/gemm_sm75.cu with clang and run it on A100. Currently the test fils with:

[----------] 5 tests from SM75_warp_gemm_tensor_op_crosswise_b1
[ RUN      ] SM75_warp_gemm_tensor_op_crosswise_b1.128x128x512_64x64x512_8x8x128
void cutlass::arch::Mma<cutlass::gemm::GemmShape<8, 8, 128>, 32, cutlass::integer_subbyte<1, false>, cutlass::layout::RowMajor, cutlass::integer_subbyte<1, false>, cutlass::layout::ColumnMajor, int, cutlass::layout::RowMajor, cutlass::arch::OpXorPopc>::operator()(FragmentC &, const FragmentA &, const FragmentB &, const FragmentC &) const not implemented
...

Expected behavior
With the condition above removed, the tests work fine:

[----------] 5 tests from SM75_warp_gemm_tensor_op_crosswise_b1
[ RUN      ] SM75_warp_gemm_tensor_op_crosswise_b1.128x128x512_64x64x512_8x8x128
[       OK ] SM75_warp_gemm_tensor_op_crosswise_b1.128x128x512_64x64x512_8x8x128 (79 ms)
[ RUN      ] SM75_warp_gemm_tensor_op_crosswise_b1.128x128x512_64x32x512_8x8x128
[       OK ] SM75_warp_gemm_tensor_op_crosswise_b1.128x128x512_64x32x512_8x8x128 (58 ms)
[ RUN      ] SM75_warp_gemm_tensor_op_crosswise_b1.128x128x512_32x32x512_8x8x128
[       OK ] SM75_warp_gemm_tensor_op_crosswise_b1.128x128x512_32x32x512_8x8x128 (40 ms)
[ RUN      ] SM75_warp_gemm_tensor_op_crosswise_b1.128x128x512_32x16x512_8x8x128
[       OK ] SM75_warp_gemm_tensor_op_crosswise_b1.128x128x512_32x16x512_8x8x128 (53 ms)
[ RUN      ] SM75_warp_gemm_tensor_op_crosswise_b1.128x128x512_16x16x512_8x8x128
[       OK ] SM75_warp_gemm_tensor_op_crosswise_b1.128x128x512_16x16x512_8x8x128 (38 ms)
[----------] 5 tests from SM75_warp_gemm_tensor_op_crosswise_b1 (270 ms total)
@Artem-B Artem-B added ? - Needs Triage bug Something isn't working labels Dec 20, 2024
@Artem-B
Copy link
Contributor Author

Artem-B commented Dec 20, 2024

This needs to be fixed in a few more places:

diff --git a/include/cutlass/arch/wmma.h b/include/cutlass/arch/wmma.h
--- a/include/cutlass/arch/wmma.h
+++ b/include/cutlass/arch/wmma.h
@@ -34,9 +34,6 @@

 #pragma once

-// CUTLASS WMMA does not support clang at present.
-#if !(defined(__clang__) && defined(__CUDA__))
-
 #if (__CUDACC_VER_MAJOR__ >= 9)
 #if (!defined(__CUDA_ARCH__) || (__CUDA_ARCH__ >= 700))
 #define CUTLASS_ARCH_WMMA_ENABLED
@@ -58,8 +55,6 @@
 #endif
 #endif

-#endif //!(defined(__clang__) && defined(__CUDA__))
-
 #if defined(CUTLASS_ARCH_WMMA_ENABLED)

 #include <mma.h>
diff --git a/include/cutlass/epilogue/warp/fragment_iterator_wmma_tensor_op.h b/include/cutlass/epilogue/warp/fragment_iterator_wmma_tensor_op.h
--- a/include/cutlass/epilogue/warp/fragment_iterator_wmma_tensor_op.h
+++ b/include/cutlass/epilogue/warp/fragment_iterator_wmma_tensor_op.h
@@ -43,8 +43,6 @@

 #pragma once

-#if !(defined(__clang__) && defined(__CUDA__))
-
 #include "third_party/gpus/cutlass/include/cutlass/wmma_array.h"
 #include "third_party/gpus/cutlass/include/cutlass/layout/matrix.h"

@@ -158,7 +156,3 @@ public:

 ////////////////////////////////////////////////////////////////////////////////

-#else
-#error (defined(__clang__) && defined(__CUDA__))
-#endif // !defined(__clang__)
-
diff --git a/include/cutlass/epilogue/warp/tile_iterator_wmma_tensor_op.h b/include/cutlass/epilogue/warp/tile_iterator_wmma_tensor_op.h
--- a/include/cutlass/epilogue/warp/tile_iterator_wmma_tensor_op.h
+++ b/include/cutlass/epilogue/warp/tile_iterator_wmma_tensor_op.h
@@ -34,8 +34,6 @@

 #pragma once

-#if !(defined(__clang__) && defined(__CUDA__))
-
 #include "third_party/gpus/cutlass/include/cutlass/cutlass.h"
 #include "third_party/gpus/cutlass/include/cutlass/wmma_array.h"
 #include "third_party/gpus/cutlass/include/cutlass/layout/matrix.h"
@@ -223,5 +221,3 @@ public:

 /////////////////////////////////////////////////////////////////////////////////////////////////

-#endif // !defined(__clang__)
-

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant