Skip to content

Conversation

@BwL1289
Copy link

@BwL1289 BwL1289 commented May 15, 2025

This PR fixes CMake rpath escaping, conditional static/shared builds, and include path with an additional /

  1. Bad rpath escaping for $ORIGIN
    • The existing link_libraries() block was incorrectly quoting rpath entries, which causes broken paths like /../lib64 when linking with lld. I.e. ld.lld: error: cannot open /../lib64: Is a directory

  2. Unconditional shared + static library builds. This adds support for opting into building just shared or just static or both, instead of always both. Also combines multiple set_target_properties.

  3. Include path install bug
    • The install rule for the headers was appending an extra slash resulting in installation in paths like /usr/local/include//cutlass

@thakkarV
Copy link
Collaborator

Thank you so much for these fixes :D this is a great MR.

@thakkarV
Copy link
Collaborator

@d-k-b CC

@thakkarV thakkarV requested a review from d-k-b May 15, 2025 14:16
@d-k-b
Copy link
Collaborator

d-k-b commented May 15, 2025

Thanks for working on this @BwL1289, always glad to get help on the build side!

@BwL1289
Copy link
Author

BwL1289 commented May 15, 2025

Happy to help! Another potential improvement is allowing the user to specify their own cxx_std_XX (instead of hardcoding cxx_std_17)

"-Wl,-rpath,'$ORIGIN'"
"-Wl,-rpath,'$ORIGIN/../lib64'"
"-Wl,-rpath,'$ORIGIN/../lib'"
"-Wl,-rpath,'$$ORIGIN'"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about this change. CMake's documentation shows the use of a single $, but I do see that we now wrap with a secondary set of '' which I wonder if that makes a difference. What system are you testing on just to get an idea of where this was failing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amz Linux 2023 (Fedora based)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might actually be a good time to look at using the CMAKE_BUILD_RPATH, CMAKE_INSTALL_RPATH, and CMAKE_BUILD_RPATH_USE_ORIGIN flags which were not fully working when CUTLASS started 😄

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, i think maybe for a separate PR? Was trying to keep as much as possible the same. Something like:

if(NOT WIN32)
  set(CMAKE_INSTALL_RPATH
  "$ORIGIN"
  "$ORIGIN/../lib64"
  "$ORIGIN/../lib"
  "${CUDA_TOOLKIT_ROOT_DIR}/lib64"
  "${CUDA_TOOLKIT_ROOT_DIR}/lib"
  )

   link_libraries(${CMAKE_DL_LIBS})
endif()

CMakeLists.txt Outdated
${CUTLASS_INCLUDE_DIR}/
${CMAKE_CURRENT_BINARY_DIR}/include/
${CUTLASS_INCLUDE_DIR}
${CMAKE_CURRENT_BINARY_DIR}/include
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ${CMAKE_CURRENT_BINARY_DIR}/include looks fishy to me. I'm trying to recall if the trailing / was used to get the contents of the include folder copied into the DESTINATION folder. Can you show the install diffs to see what moved?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@d-k-b unfortunately I don't have the diff handy, but with the fix here are the install paths. Look OK?

-- Install configuration: "Release"
-- Up-to-date: /usr/local/include/include
-- Up-to-date: /usr/local/include/include/cute
-- Up-to-date: /usr/local/include/include/cute/tensor_predicate.hpp
-- Up-to-date: /usr/local/include/include/cute/tensor.hpp
-- Up-to-date: /usr/local/include/include/cute/pointer_sparse.hpp
-- Up-to-date: /usr/local/include/include/cute/util
...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, removing these trailing /s is causing the entire folder to be copied to the given DESTINATION instead of the contents. These two lines need to be reverted.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@d-k-b ack done

set(CUTLASS_BUILD_MONO_LIBRARY OFF CACHE BOOL
"Determines whether the cutlass library is generated as a single file or multiple files.")

option(BUILD_SHARED_LIBS "Build shared libraries" ON)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BUILD_SHARED_LIBS is a global CMake flag and has side effects for many functions. Can you add the CUTLASS_ prefix to these so we can limit the use to just CUTLASS code?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack. done. i changed both shared and static flags to use the prefix.

"Determines whether the cutlass library is generated as a single file or multiple files.")

option(CUTLASS_BUILD_SHARED_LIBS "Build shared libraries" ON)
option(CUTLASS_BUILD_STATIC_LIBS "Build static libraries" OFF)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For backward compatibility, the default for both of these will need to be ON, although we could decide to add a compatibility flag with message if we wanted that in the future we planned to make the default OFF for shared. I would save that extension for a separate request though.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack done. changed to default both to ON

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BwL1289 -- this default must be changed to ON.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@d-k-b ack done

@d-k-b
Copy link
Collaborator

d-k-b commented May 15, 2025

Happy to help! Another potential improvement is allowing the user to specify their own cxx_std_XX (instead of hardcoding cxx_std_17)

Making that configurable may work, although we'd need to ensure it was at least 17 otherwise many things would break. But enforcing a higher standard may be doable possibly with some code conditionals.

@BwL1289
Copy link
Author

BwL1289 commented May 15, 2025

Happy to help! Another potential improvement is allowing the user to specify their own cxx_std_XX (instead of hardcoding cxx_std_17)

Making that configurable may work, although we'd need to ensure it was at least 17 otherwise many things would break. But enforcing a higher standard may be doable possibly with some code conditionals.

@d-k-b looks like 17 and 11 are also being mixed at various places in the build:

https://github.com/NVIDIA/cutlass/blob/main/CMakeLists.txt#L677
https://github.com/NVIDIA/cutlass/blob/main/CUDA.cmake#L308

@d-k-b
Copy link
Collaborator

d-k-b commented May 16, 2025

@d-k-b looks like 17 and 11 are also being mixed at various places in the build:

https://github.com/NVIDIA/cutlass/blob/main/CMakeLists.txt#L677 https://github.com/NVIDIA/cutlass/blob/main/CUDA.cmake#L308

Good catch, those should probably be updated to 17.

@BwL1289
Copy link
Author

BwL1289 commented May 16, 2025

@d-k-b looks like 17 and 11 are also being mixed at various places in the build:
https://github.com/NVIDIA/cutlass/blob/main/CMakeLists.txt#L677 https://github.com/NVIDIA/cutlass/blob/main/CUDA.cmake#L308

Good catch, those should probably be updated to 17.

@d-k-b thanks. Responded to your other comments as well.

@BwL1289
Copy link
Author

BwL1289 commented May 20, 2025

@d-k-b bumping this

@d-k-b
Copy link
Collaborator

d-k-b commented May 22, 2025

@d-k-b bumping this

@BwL1289 -- thanks for your patience, I need to run additional testing on this internally but bandwidth has been limited this week. I will attempt to finish up the review soon.

@BwL1289
Copy link
Author

BwL1289 commented May 22, 2025

@d-k-b bumping this

@BwL1289 -- thanks for your patience, I need to run additional testing on this internally but bandwidth has been limited this week. I will attempt to finish up the review soon.

@d-k-b thanks, let me know how if I can help.

@d-k-b
Copy link
Collaborator

d-k-b commented May 28, 2025

@BwL1289 -- did you have a chance to verify this? #2305 (comment)

@BwL1289
Copy link
Author

BwL1289 commented May 28, 2025

@BwL1289 -- did you have a chance to verify this? #2305 (comment)

@d-k-b I don't have the diff available, but I pasted the new install paths

@BwL1289
Copy link
Author

BwL1289 commented May 28, 2025

@d-k-b are you looking for something different? Let me know.

@BwL1289
Copy link
Author

BwL1289 commented May 28, 2025

@d-k-b Here is the original install paths. It actually looks like the new version appends an additional /include to the path.

-- Install configuration: "Release"
-- Up-to-date: /usr/local/include
-- Installing: /usr/local/include/cute
-- Installing: /usr/local/include/cute/tensor_predicate.hpp
-- Installing: /usr/local/include/cute/tensor.hpp
-- Installing: /usr/local/include/cute/pointer_sparse.hpp
-- Installing: /usr/local/include/cute/util
...

@BwL1289
Copy link
Author

BwL1289 commented Jun 4, 2025

@BwL1289, removal of the trailing slashes on the DESTINATION directory seems like a good idea, but it seems they are necessary still on the ends of the DIRECTORY arguments because there it is a modifier to tell CMake that the contents of that directory need to get copied and not the entire directory itself. Can you make those changes as suggested and see if it fixes the issue you cared about while keeping the actual locations equivalent?

ack done. here are the new paths as we expect:

-- Install configuration: "Release"
-- Up-to-date: /usr/local/include
-- Up-to-date: /usr/local/include/cute
-- Installing: /usr/local/include/cute/tensor_predicate.hpp
-- Installing: /usr/local/include/cute/tensor.hpp
-- Installing: /usr/local/include/cute/pointer_sparse.hpp
-- Up-to-date: /usr/local/include/cute/util
-- Installing: /usr/local/include/cute/util/debug.hpp
-- Installing: /usr/local/include/cute/util/type_traits.hpp
-- Installing: /usr/local/include/cute/util/print.hpp
-- Installing: /usr/local/include/cute/tensor_zip.hpp
-- Up-to-date: /usr/local/include/cute/atom
-- Installing: /usr/local/include/cute/atom/copy_traits_sm75.hpp
-- Installing: /usr/local/include/cute/atom/copy_traits_sm90_tma_swizzle.hpp
-- Installing: /usr/local/include/cute/atom/mma_traits_sm90_gmma.hpp
-- Installing: /usr/local/include/cute/atom/mma_atom.hpp
-- Installing: /usr/local/include/cute/atom/mma_traits_sm120.hpp
-- Installing: /usr/local/include/cute/atom/mma_traits_sm120_sparse.hpp
-- Installing: /usr/local/include/cute/atom/mma_traits_sm90_gmma_ext.hpp
-- Installing: /usr/local/include/cute/atom/mma_traits_sm90.hpp
-- Installing: /usr/local/include/cute/atom/partitioner.hpp
-- Installing: /usr/local/include/cute/atom/copy_traits_sm100_tma.hpp
-- Installing: /usr/local/include/cute/atom/copy_traits_sm100.hpp
-- Installing: /usr/local/include/cute/atom/copy_atom.hpp
-- Installing: /usr/local/include/cute/atom/mma_traits_sm80.hpp
-- Installing: /usr/local/include/cute/atom/mma_traits_sm90_gmma_sparse_ext.hpp
-- Installing: /usr/local/include/cute/atom/mma_traits_sm70.hpp
-- Installing: /usr/local/include/cute/atom/copy_traits.hpp
-- Installing: /usr/local/include/cute/atom/copy_traits_sm90_im2col.hpp
-- Installing: /usr/local/include/cute/atom/mma_traits_sm75.hpp
-- Installing: /usr/local/include/cute/atom/mma_traits_sm61.hpp
-- Installing: /usr/local/include/cute/atom/mma_traits_sm89.hpp
-- Installing: /usr/local/include/cute/atom/copy_traits_sm100_im2col.hpp
-- Installing: /usr/local/include/cute/atom/mma_traits_sm90_gmma_sparse.hpp
-- Installing: /usr/local/include/cute/atom/mma_traits_sm100.hpp
-- Installing: /usr/local/include/cute/atom/copy_traits_sm80.hpp
-- Installing: /usr/local/include/cute/atom/mma_traits.hpp
-- Installing: /usr/local/include/cute/atom/copy_traits_sm90.hpp
-- Installing: /usr/local/include/cute/atom/copy_traits_sm90_tma.hpp
-- Installing: /usr/local/include/cute/atom/copy_traits_sm50.hpp
-- Installing: /usr/local/include/cute/swizzle_layout.hpp
-- Installing: /usr/local/include/cute/pointer_swizzle.hpp
-- Installing: /usr/local/include/cute/config.hpp
-- Installing: /usr/local/include/cute/pointer_base.hpp
-- Up-to-date: /usr/local/include/cute/algorithm
-- Installing: /usr/local/include/cute/algorithm/prefetch.hpp
-- Installing: /usr/local/include/cute/algorithm/cooperative_copy.hpp
-- Installing: /usr/local/include/cute/algorithm/cooperative_gemm.hpp
-- Installing: /usr/local/include/cute/algorithm/tuple_algorithms.hpp
-- Installing: /usr/local/include/cute/algorithm/tensor_algorithms.hpp
-- Installing: /usr/local/include/cute/algorithm/clear.hpp
-- Installing: /usr/local/include/cute/algorithm/functional.hpp
-- Installing: /usr/local/include/cute/algorithm/fill.hpp
-- Installing: /usr/local/include/cute/algorithm/gemm.hpp
-- Installing: /usr/local/include/cute/algorithm/prefer.hpp
-- Installing: /usr/local/include/cute/algorithm/copy.hpp
-- Installing: /usr/local/include/cute/algorithm/axpby.hpp
-- Installing: /usr/local/include/cute/pointer.hpp
-- Up-to-date: /usr/local/include/cute/container
-- Installing: /usr/local/include/cute/container/alignment.hpp
-- Installing: /usr/local/include/cute/container/cuda_types.hpp
-- Installing: /usr/local/include/cute/container/bit_field.hpp
-- Installing: /usr/local/include/cute/container/array.hpp
-- Installing: /usr/local/include/cute/container/array_subbyte.hpp
-- Installing: /usr/local/include/cute/container/tuple.hpp
-- Installing: /usr/local/include/cute/container/array_aligned.hpp
-- Installing: /usr/local/include/cute/container/type_list.hpp
-- Up-to-date: /usr/local/include/cute/numeric
-- Installing: /usr/local/include/cute/numeric/numeric_types.hpp
-- Installing: /usr/local/include/cute/numeric/arithmetic_tuple.hpp
-- Installing: /usr/local/include/cute/numeric/real.hpp
-- Installing: /usr/local/include/cute/numeric/complex.hpp
-- Installing: /usr/local/include/cute/numeric/math.hpp
-- Installing: /usr/local/include/cute/numeric/integral_constant.hpp
-- Installing: /usr/local/include/cute/numeric/int.hpp
-- Installing: /usr/local/include/cute/numeric/integer_sequence.hpp
-- Installing: /usr/local/include/cute/numeric/integral_ratio.hpp
-- Installing: /usr/local/include/cute/layout.hpp
-- Up-to-date: /usr/local/include/cute/arch
-- Installing: /usr/local/include/cute/arch/mma_sm90_gmma_ext.hpp
-- Installing: /usr/local/include/cute/arch/mma_sm90_gmma.hpp
-- Installing: /usr/local/include/cute/arch/mma_sm90_desc.hpp
-- Installing: /usr/local/include/cute/arch/copy_sm75.hpp
-- Installing: /usr/local/include/cute/arch/cluster_sm90.hpp
-- Installing: /usr/local/include/cute/arch/mma_sm75.hpp
-- Installing: /usr/local/include/cute/arch/mma_sm61.hpp
-- Installing: /usr/local/include/cute/arch/tmem_allocator_sm100.hpp
-- Installing: /usr/local/include/cute/arch/mma_sm90_gmma_sparse_ext.hpp
-- Installing: /usr/local/include/cute/arch/copy_sm100_tma.hpp
-- Installing: /usr/local/include/cute/arch/mma_sm100.hpp
-- Installing: /usr/local/include/cute/arch/mma_sm89.hpp
-- Installing: /usr/local/include/cute/arch/config.hpp
-- Installing: /usr/local/include/cute/arch/mma.hpp
-- Installing: /usr/local/include/cute/arch/copy_sm90_desc.hpp
-- Installing: /usr/local/include/cute/arch/copy_sm100.hpp
-- Installing: /usr/local/include/cute/arch/mma_sm70.hpp
-- Installing: /usr/local/include/cute/arch/mma_sm100_umma.hpp
-- Installing: /usr/local/include/cute/arch/copy_sm90_tma.hpp
-- Installing: /usr/local/include/cute/arch/mma_sm120_sparse.hpp
-- Installing: /usr/local/include/cute/arch/mma_sm120.hpp
-- Installing: /usr/local/include/cute/arch/mma_sm80.hpp
-- Installing: /usr/local/include/cute/arch/mma_sm90.hpp
-- Installing: /usr/local/include/cute/arch/simd_sm100.hpp
-- Installing: /usr/local/include/cute/arch/copy_sm90.hpp
-- Installing: /usr/local/include/cute/arch/mma_sm100_desc.hpp
-- Installing: /usr/local/include/cute/arch/util.hpp
-- Installing: /usr/local/include/cute/arch/copy_sm50.hpp
-- Installing: /usr/local/include/cute/arch/mma_sm90_gmma_sparse.hpp
-- Installing: /usr/local/include/cute/arch/cluster_sm100.hpp
-- Installing: /usr/local/include/cute/arch/copy.hpp
-- Installing: /usr/local/include/cute/arch/copy_sm80.hpp
-- Installing: /usr/local/include/cute/int_tuple.hpp
-- Installing: /usr/local/include/cute/tensor_impl.hpp
-- Installing: /usr/local/include/cute/swizzle.hpp
-- Installing: /usr/local/include/cute/layout_composed.hpp
-- Installing: /usr/local/include/cute/underscore.hpp
-- Installing: /usr/local/include/cute/stride.hpp
-- Installing: /usr/local/include/cute/pointer_flagged.hpp
-- Up-to-date: /usr/local/include/cutlass
-- Installing: /usr/local/include/cutlass/tensor_ref_planar_complex.h
-- Installing: /usr/local/include/cutlass/semaphore.h
-- Installing: /usr/local/include/cutlass/kernel_hardware_info.h
-- Installing: /usr/local/include/cutlass/half.h
-- Installing: /usr/local/include/cutlass/pitch_linear_coord.h
-- Installing: /usr/local/include/cutlass/cutlass.h
-- Up-to-date: /usr/local/include/cutlass/pipeline
-- Installing: /usr/local/include/cutlass/pipeline/sm100_pipeline.hpp
-- Installing: /usr/local/include/cutlass/pipeline/sm90_pipeline.hpp
-- Installing: /usr/local/include/cutlass/pipeline/pipeline.hpp
-- Up-to-date: /usr/local/include/cutlass/experimental
-- Up-to-date: /usr/local/include/cutlass/experimental/distributed
-- Up-to-date: /usr/local/include/cutlass/experimental/distributed/schedules
-- Installing: /usr/local/include/cutlass/experimental/distributed/schedules/dist_gemm_base_schedule.hpp
-- Installing: /usr/local/include/cutlass/experimental/distributed/schedules/dist_gemm_1d_schedules.hpp
-- Up-to-date: /usr/local/include/cutlass/experimental/distributed/device
-- Installing: /usr/local/include/cutlass/experimental/distributed/device/dist_gemm_universal_wrapper.hpp
-- Installing: /usr/local/include/cutlass/experimental/distributed/device/full_barrier.hpp
-- Installing: /usr/local/include/cutlass/experimental/distributed/device/detail.hpp
-- Up-to-date: /usr/local/include/cutlass/experimental/distributed/kernel
-- Installing: /usr/local/include/cutlass/experimental/distributed/kernel/full_barrier.hpp
-- Installing: /usr/local/include/cutlass/experimental/distributed/kernel/dist_gemm_kernel_wrapper.hpp
-- Installing: /usr/local/include/cutlass/experimental/distributed/kernel/detail.hpp
-- Installing: /usr/local/include/cutlass/version.h
-- Installing: /usr/local/include/cutlass/floating_point_nvrtc.h
-- Installing: /usr/local/include/cutlass/blas3_types.h
-- Installing: /usr/local/include/cutlass/matrix_shape.h
-- Installing: /usr/local/include/cutlass/barrier.h
-- Installing: /usr/local/include/cutlass/matrix_coord.h
-- Installing: /usr/local/include/cutlass/kernel_launch.h
-- Installing: /usr/local/include/cutlass/tensor_view_planar_complex.h
-- Up-to-date: /usr/local/include/cutlass/layout
-- Installing: /usr/local/include/cutlass/layout/permute.h
-- Installing: /usr/local/include/cutlass/layout/tensor_op_multiplicand_sm75.h
-- Installing: /usr/local/include/cutlass/layout/matrix.h
-- Installing: /usr/local/include/cutlass/layout/pitch_linear.h
-- Installing: /usr/local/include/cutlass/layout/layout.h
-- Installing: /usr/local/include/cutlass/layout/tensor_op_multiplicand_sm80.h
-- Installing: /usr/local/include/cutlass/layout/vector.h
-- Installing: /usr/local/include/cutlass/layout/tensor.h
-- Installing: /usr/local/include/cutlass/layout/tensor_op_multiplicand_sm70.h
-- Installing: /usr/local/include/cutlass/fast_math.h
-- Installing: /usr/local/include/cutlass/numeric_size.h
-- Installing: /usr/local/include/cutlass/block_striped.h
-- Installing: /usr/local/include/cutlass/float8.h
-- Up-to-date: /usr/local/include/cutlass/platform
-- Installing: /usr/local/include/cutlass/platform/platform.h
-- Installing: /usr/local/include/cutlass/cuda_host_adapter.hpp
-- Installing: /usr/local/include/cutlass/numeric_conversion.h
-- Installing: /usr/local/include/cutlass/integer_subbyte.h
-- Installing: /usr/local/include/cutlass/quaternion.h
-- Installing: /usr/local/include/cutlass/matrix.h
-- Installing: /usr/local/include/cutlass/relatively_equal.h
-- Installing: /usr/local/include/cutlass/uint128.h
-- Installing: /usr/local/include/cutlass/array_subbyte.h
-- Installing: /usr/local/include/cutlass/coord.h
-- Installing: /usr/local/include/cutlass/real.h
-- Installing: /usr/local/include/cutlass/exmy_base.h
-- Up-to-date: /usr/local/include/cutlass/conv
-- Up-to-date: /usr/local/include/cutlass/conv/threadblock
-- Installing: /usr/local/include/cutlass/conv/threadblock/conv3d_dgrad_output_gradient_tile_access_iterator_analytic.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/conv2d_fprop_filter_tile_access_iterator_analytic.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/conv2d_wgrad_output_gradient_tile_access_iterator_optimized.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/depthwise_fprop_direct_conv_multistage.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/depthwise_mma_core_with_lane_access_size.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/conv3d_fprop_filter_tile_access_iterator_analytic.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/conv3d_params.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/conv2d_fprop_filter_tile_access_iterator_fixed_channels.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/conv3d_fprop_filter_tile_access_iterator_optimized.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/depthwise_fprop_activation_tile_access_iterator_direct_conv_fixed_stride_dilation.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/implicit_gemm_pipelined.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/conv2d_dgrad_output_gradient_tile_access_iterator_analytic.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/depthwise_direct_conv_params.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/conv2d_wgrad_activation_tile_access_iterator_optimized.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/conv3d_dgrad_filter_tile_access_iterator_optimized.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/conv2d_dgrad_output_gradient_tile_access_iterator_optimized.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/implicit_gemm_fprop_fusion_multistage.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/conv2d_fprop_activation_tile_access_iterator_optimized.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/depthwise_mma_base.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/conv2d_params.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/depthwise_fprop_filter_tile_access_iterator_direct_conv_optimized.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/conv2d_fprop_activation_tile_access_iterator_few_channels.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/implicit_gemm_multistage.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/conv3d_wgrad_output_gradient_tile_access_iterator_optimized.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/implicit_gemm_wgrad_fusion_multistage.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/threadblock_swizzle.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/conv3d_wgrad_output_gradient_tile_access_iterator_analytic.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/conv2d_dgrad_filter_tile_access_iterator_optimized.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/depthwise_fprop_pipelined.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/depthwise_fprop_activation_tile_access_iterator_direct_conv_optimized.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/conv2d_wgrad_activation_tile_access_iterator_analytic.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/conv3d_wgrad_activation_tile_access_iterator_optimized.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/conv2d_dgrad_filter_tile_access_iterator_analytic.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/predicated_scale_bias_vector_iterator.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/conv2d_fprop_filter_tile_access_iterator_optimized.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/conv3d_wgrad_activation_tile_access_iterator_analytic.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/conv3d_fprop_activation_tile_access_iterator_optimized.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/conv3d_fprop_activation_tile_access_iterator_analytic.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/conv3d_dgrad_output_gradient_tile_access_iterator_optimized.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/conv2d_wgrad_output_gradient_tile_access_iterator_analytic.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/conv2d_fprop_activation_tile_access_iterator_fixed_channels.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/conv2d_tile_iterator.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/conv2d_fprop_filter_tile_access_iterator_few_channels.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/conv3d_dgrad_filter_tile_access_iterator_analytic.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/predicated_scale_bias_vector_access_iterator.h
-- Installing: /usr/local/include/cutlass/conv/threadblock/conv2d_fprop_activation_tile_access_iterator_analytic.h
-- Installing: /usr/local/include/cutlass/conv/convolution.h
-- Up-to-date: /usr/local/include/cutlass/conv/collective
-- Up-to-date: /usr/local/include/cutlass/conv/collective/builders
-- Installing: /usr/local/include/cutlass/conv/collective/builders/sm100_common.inl
-- Installing: /usr/local/include/cutlass/conv/collective/builders/sm90_common.inl
-- Installing: /usr/local/include/cutlass/conv/collective/builders/sm90_gmma_builder.inl
-- Installing: /usr/local/include/cutlass/conv/collective/builders/sm100_umma_builder.inl
-- Installing: /usr/local/include/cutlass/conv/collective/collective_conv.hpp
-- Installing: /usr/local/include/cutlass/conv/collective/sm100_implicit_gemm_umma_warpspecialized.hpp
-- Installing: /usr/local/include/cutlass/conv/collective/sm90_implicit_gemm_gmma_ss_warpspecialized.hpp
-- Installing: /usr/local/include/cutlass/conv/collective/collective_builder.hpp
-- Installing: /usr/local/include/cutlass/conv/collective/detail.hpp
-- Installing: /usr/local/include/cutlass/conv/conv2d_problem_size.h
-- Up-to-date: /usr/local/include/cutlass/conv/warp
-- Installing: /usr/local/include/cutlass/conv/warp/mma_depthwise_simt.h
-- Installing: /usr/local/include/cutlass/conv/warp/mma_depthwise_simt_tile_iterator.h
-- Installing: /usr/local/include/cutlass/conv/warp/scale_bias_relu_transform.h
-- Installing: /usr/local/include/cutlass/conv/convnd_problem_shape.hpp
-- Up-to-date: /usr/local/include/cutlass/conv/device
-- Installing: /usr/local/include/cutlass/conv/device/implicit_gemm_convolution.h
-- Installing: /usr/local/include/cutlass/conv/device/implicit_gemm_convolution_fusion.h
-- Installing: /usr/local/include/cutlass/conv/device/direct_convolution.h
-- Installing: /usr/local/include/cutlass/conv/device/conv_universal_adapter.hpp
-- Installing: /usr/local/include/cutlass/conv/detail.hpp
-- Up-to-date: /usr/local/include/cutlass/conv/thread
-- Installing: /usr/local/include/cutlass/conv/thread/depthwise_mma.h
-- Up-to-date: /usr/local/include/cutlass/conv/kernel
-- Installing: /usr/local/include/cutlass/conv/kernel/default_deconv3d.h
-- Installing: /usr/local/include/cutlass/conv/kernel/default_conv3d_wgrad.h
-- Installing: /usr/local/include/cutlass/conv/kernel/default_conv2d_wgrad.h
-- Installing: /usr/local/include/cutlass/conv/kernel/implicit_gemm_convolution.h
-- Installing: /usr/local/include/cutlass/conv/kernel/default_deconv2d.h
-- Installing: /usr/local/include/cutlass/conv/kernel/default_conv2d.h
-- Installing: /usr/local/include/cutlass/conv/kernel/default_conv2d_group_fprop.h
-- Installing: /usr/local/include/cutlass/conv/kernel/default_conv2d_wgrad_fusion.h
-- Installing: /usr/local/include/cutlass/conv/kernel/sm100_implicit_gemm_tma_warpspecialized.hpp
-- Installing: /usr/local/include/cutlass/conv/kernel/implicit_gemm_convolution_fusion.h
-- Installing: /usr/local/include/cutlass/conv/kernel/default_conv2d_fprop_with_broadcast.h
-- Installing: /usr/local/include/cutlass/conv/kernel/default_deconv2d_with_broadcast.h
-- Installing: /usr/local/include/cutlass/conv/kernel/conv_universal.hpp
-- Installing: /usr/local/include/cutlass/conv/kernel/direct_convolution.h
-- Installing: /usr/local/include/cutlass/conv/kernel/default_conv2d_fprop_with_absmax.h
-- Installing: /usr/local/include/cutlass/conv/kernel/default_deconv3d_with_broadcast.h
-- Installing: /usr/local/include/cutlass/conv/kernel/default_conv2d_fprop_fusion.h
-- Installing: /usr/local/include/cutlass/conv/kernel/default_conv2d_fprop_with_reduction.h
-- Installing: /usr/local/include/cutlass/conv/kernel/sm90_implicit_gemm_tma_warpspecialized.hpp
-- Installing: /usr/local/include/cutlass/conv/kernel/implicit_gemm_convolution_strided_dgrad.h
-- Installing: /usr/local/include/cutlass/conv/kernel/default_conv3d_fprop_with_broadcast.h
-- Installing: /usr/local/include/cutlass/conv/kernel/default_depthwise_fprop.h
-- Installing: /usr/local/include/cutlass/conv/kernel/default_conv3d_fprop_fusion.h
-- Installing: /usr/local/include/cutlass/conv/kernel/default_conv2d_fprop.h
-- Installing: /usr/local/include/cutlass/conv/kernel/default_conv3d_fprop.h
-- Installing: /usr/local/include/cutlass/conv/kernel/default_conv3d_dgrad.h
-- Installing: /usr/local/include/cutlass/conv/kernel/default_conv2d_dgrad.h
-- Installing: /usr/local/include/cutlass/conv/kernel/implicit_gemm_convolution_with_absmax.h
-- Installing: /usr/local/include/cutlass/conv/kernel/implicit_gemm_convolution_with_fused_epilogue.h
-- Installing: /usr/local/include/cutlass/conv/conv3d_problem_size.h
-- Installing: /usr/local/include/cutlass/conv/dispatch_policy.hpp
-- Installing: /usr/local/include/cutlass/numeric_types.h
-- Installing: /usr/local/include/cutlass/functional.h
-- Installing: /usr/local/include/cutlass/tensor_coord.h
-- Installing: /usr/local/include/cutlass/blas3.h
-- Up-to-date: /usr/local/include/cutlass/gemm
-- Installing: /usr/local/include/cutlass/gemm/gemm.h
-- Up-to-date: /usr/local/include/cutlass/gemm/threadblock
-- Installing: /usr/local/include/cutlass/gemm/threadblock/mma_pipelined.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/mma_base.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/default_gemv_core.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/mma_with_reduction_multistage.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/ell_mma_multistage.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/default_mma_core_simt.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/default_mma_core_sm70.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/default_mma_core_with_access_size.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/mma_multistage.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/default_mma_core_with_reduction.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/default_sparse_mma.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/default_mma_core_sparse_sm80.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/mma_softmax_mainloop_fusion_multistage.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/default_mma_planar_complex_multistage.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/gemv.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/index_remat.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/default_trmm.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/mma_planar_complex_base.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/default_mma_planar_complex_pipelined.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/default_multistage_mma_complex_core_sm80.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/default_ell_mma.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/mma_planar_complex_pipelined.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/default_mma_core.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/default_mma_with_reduction.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/default_multistage_mma_complex.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/mma_layernorm_mainloop_fusion_multistage.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/default_multistage_mma_complex_core.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/default_mma_core_sm80.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/ell_mma_pipelined.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/threadblock_swizzle.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/mma_singlestage.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/mma_sparse_base.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/default_mma_softmax_mainloop_fusion.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/default_mma_core_sm75.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/default_mma_core_wmma.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/mma_planar_complex_multistage.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/default_mma_layernorm_mainloop_fusion.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/threadblock_swizzle_streamk.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/mma_sparse_multistage.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/default_multistage_trmm_complex.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/mma_blas3_multistage.h
-- Installing: /usr/local/include/cutlass/gemm/threadblock/default_mma.h
-- Installing: /usr/local/include/cutlass/gemm/group_array_problem_shape.hpp
-- Up-to-date: /usr/local/include/cutlass/gemm/collective
-- Installing: /usr/local/include/cutlass/gemm/collective/sm90_mma_array_tma_gmma_ss_warpspecialized_fp8_blockwise_scaling.hpp
-- Installing: /usr/local/include/cutlass/gemm/collective/sm90_sparse_mma_tma_gmma_ss_warpspecialized_fp8.hpp
-- Installing: /usr/local/include/cutlass/gemm/collective/collective_mma_decl.hpp
-- Installing: /usr/local/include/cutlass/gemm/collective/sm90_mma_tma_gmma_ss_warpspecialized_fp8_blockwise_scaling.hpp
-- Installing: /usr/local/include/cutlass/gemm/collective/sm80_mma_multistage.hpp
-- Up-to-date: /usr/local/include/cutlass/gemm/collective/builders
-- Installing: /usr/local/include/cutlass/gemm/collective/builders/sm100_common.inl
-- Installing: /usr/local/include/cutlass/gemm/collective/builders/sm90_common.inl
-- Installing: /usr/local/include/cutlass/gemm/collective/builders/sm1xx_common.inl
-- Installing: /usr/local/include/cutlass/gemm/collective/builders/sm100_sparse_umma_builder.inl
-- Installing: /usr/local/include/cutlass/gemm/collective/builders/sm1xx_sparse_config.inl
-- Installing: /usr/local/include/cutlass/gemm/collective/builders/sm90_gmma_builder.inl
-- Installing: /usr/local/include/cutlass/gemm/collective/builders/sm100_blockscaled_umma_builder.inl
-- Installing: /usr/local/include/cutlass/gemm/collective/builders/sm100_blockwise_umma_builder.inl
-- Installing: /usr/local/include/cutlass/gemm/collective/builders/sm100_blockscaled_sparse_umma_builder.inl
-- Installing: /usr/local/include/cutlass/gemm/collective/builders/sm120_blockscaled_sparse_mma_builder.inl
-- Installing: /usr/local/include/cutlass/gemm/collective/builders/sm120_mma_builder.inl
-- Installing: /usr/local/include/cutlass/gemm/collective/builders/sm100_umma_builder.inl
-- Installing: /usr/local/include/cutlass/gemm/collective/builders/sm120_sparse_mma_builder.inl
-- Installing: /usr/local/include/cutlass/gemm/collective/builders/sm120_blockscaled_mma_builder.inl
-- Installing: /usr/local/include/cutlass/gemm/collective/builders/sm90_sparse_config.inl
-- Installing: /usr/local/include/cutlass/gemm/collective/builders/sm100_pipeline_carveout.inl
-- Installing: /usr/local/include/cutlass/gemm/collective/builders/sm100_9xBF16_umma_builder.inl
-- Installing: /usr/local/include/cutlass/gemm/collective/builders/sm120_common.inl
-- Installing: /usr/local/include/cutlass/gemm/collective/builders/sm90_sparse_gmma_builder.inl
-- Installing: /usr/local/include/cutlass/gemm/collective/sm100_mma_warpspecialized_emulated.hpp
-- Installing: /usr/local/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp
-- Installing: /usr/local/include/cutlass/gemm/collective/sm120_blockscaled_mma_tma.hpp
-- Installing: /usr/local/include/cutlass/gemm/collective/sm100_mma_warpspecialized_mixed_input.hpp
-- Installing: /usr/local/include/cutlass/gemm/collective/sm100_blockscaled_sparse_mma_warpspecialized.hpp
-- Installing: /usr/local/include/cutlass/gemm/collective/sm90_mma_tma_gmma_ss.hpp
-- Installing: /usr/local/include/cutlass/gemm/collective/sm90_mma_tma_gmma_rs_warpspecialized_mixed_input.hpp
-- Installing: /usr/local/include/cutlass/gemm/collective/sm90_mma_array_tma_gmma_rs_warpspecialized_mixed_input.hpp
-- Installing: /usr/local/include/cutlass/gemm/collective/sm100_mma_warpspecialized_blockwise_scaling.hpp
-- Installing: /usr/local/include/cutlass/gemm/collective/sm100_sparse_mma_warpspecialized.hpp
-- Installing: /usr/local/include/cutlass/gemm/collective/sm90_mma_tma_gmma_rs_warpspecialized.hpp
-- Installing: /usr/local/include/cutlass/gemm/collective/sm90_sparse_mma_tma_gmma_ss_warpspecialized.hpp
-- Installing: /usr/local/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp
-- Installing: /usr/local/include/cutlass/gemm/collective/sm90_mma_tma_gmma_ss_warpspecialized_fp8.hpp
-- Installing: /usr/local/include/cutlass/gemm/collective/sm90_mma_array_tma_gmma_ss_warpspecialized_fp8.hpp
-- Installing: /usr/local/include/cutlass/gemm/collective/sm120_sparse_mma_tma.hpp
-- Installing: /usr/local/include/cutlass/gemm/collective/collective_builder.hpp
-- Installing: /usr/local/include/cutlass/gemm/collective/sm100_mma_array_warpspecialized.hpp
-- Installing: /usr/local/include/cutlass/gemm/collective/sm90_mma_multistage_gmma_rs_warpspecialized.hpp
-- Installing: /usr/local/include/cutlass/gemm/collective/sm100_mma_array_warpspecialized_emulated.hpp
-- Installing: /usr/local/include/cutlass/gemm/collective/collective_mma.hpp
-- Installing: /usr/local/include/cutlass/gemm/collective/collective_builder_decl.hpp
-- Installing: /usr/local/include/cutlass/gemm/collective/sm90_mma_tma_gmma_ss_warpspecialized.hpp
-- Installing: /usr/local/include/cutlass/gemm/collective/sm120_mma_tma.hpp
-- Installing: /usr/local/include/cutlass/gemm/collective/fp8_accumulation.hpp
-- Installing: /usr/local/include/cutlass/gemm/collective/sm70_mma_twostage.hpp
-- Installing: /usr/local/include/cutlass/gemm/collective/sm90_mma_multistage_gmma_ss_warpspecialized.hpp
-- Installing: /usr/local/include/cutlass/gemm/collective/sm120_blockscaled_sparse_mma_tma.hpp
-- Installing: /usr/local/include/cutlass/gemm/collective/sm90_mma_array_tma_gmma_ss_warpspecialized.hpp
-- Installing: /usr/local/include/cutlass/gemm/collective/sm100_mma_warpspecialized.hpp
-- Installing: /usr/local/include/cutlass/gemm/collective/sm100_mma_array_warpspecialized_blockwise_scaling.hpp
-- Installing: /usr/local/include/cutlass/gemm/collective/sm120_blockscaled_mma_array_tma.hpp
-- Installing: /usr/local/include/cutlass/gemm/gemm_enumerated_types.h
-- Up-to-date: /usr/local/include/cutlass/gemm/warp
-- Installing: /usr/local/include/cutlass/gemm/warp/mma_tensor_op_tile_iterator_sm80.h
-- Installing: /usr/local/include/cutlass/gemm/warp/mma_gaussian_complex_tensor_op_tile_iterator_sm80.h
-- Installing: /usr/local/include/cutlass/gemm/warp/softmax_scale_bias_transform.h
-- Installing: /usr/local/include/cutlass/gemm/warp/mma_simt_tile_iterator.h
-- Installing: /usr/local/include/cutlass/gemm/warp/default_mma_wmma_tensor_op.h
-- Installing: /usr/local/include/cutlass/gemm/warp/mma_tensor_op_tile_access_iterator.h
-- Installing: /usr/local/include/cutlass/gemm/warp/mma_simt_policy.h
-- Installing: /usr/local/include/cutlass/gemm/warp/layernorm_scale_bias_transform.h
-- Installing: /usr/local/include/cutlass/gemm/warp/mma_planar_complex.h
-- Installing: /usr/local/include/cutlass/gemm/warp/mma_tensor_op_tile_iterator_sm70.h
-- Installing: /usr/local/include/cutlass/gemm/warp/default_mma_tensor_op_sm80.h
-- Installing: /usr/local/include/cutlass/gemm/warp/default_mma_complex_tensor_op.h
-- Installing: /usr/local/include/cutlass/gemm/warp/mma_gaussian_complex_tensor_op.h
-- Installing: /usr/local/include/cutlass/gemm/warp/mma_tensor_op_wmma.h
-- Installing: /usr/local/include/cutlass/gemm/warp/mma_tensor_op_policy.h
-- Installing: /usr/local/include/cutlass/gemm/warp/mma_complex_tensor_op.h
-- Installing: /usr/local/include/cutlass/gemm/warp/mma_tensor_op.h
-- Installing: /usr/local/include/cutlass/gemm/warp/mma_mixed_input_tensor_op.h
-- Installing: /usr/local/include/cutlass/gemm/warp/default_mma_with_reduction_tensor_op.h
-- Installing: /usr/local/include/cutlass/gemm/warp/mma_sparse_tensor_op.h
-- Installing: /usr/local/include/cutlass/gemm/warp/default_mma_tensor_op.h
-- Installing: /usr/local/include/cutlass/gemm/warp/mma_tensor_op_tile_iterator.h
-- Installing: /usr/local/include/cutlass/gemm/warp/tile_iterator_planar_complex.h
-- Installing: /usr/local/include/cutlass/gemm/warp/mma_tensor_op_fragment_iterator.h
-- Installing: /usr/local/include/cutlass/gemm/warp/mma_tensor_op_tile_iterator_sparse.h
-- Installing: /usr/local/include/cutlass/gemm/warp/mma_complex_tensor_op_fast_f32.h
-- Installing: /usr/local/include/cutlass/gemm/warp/mma_tensor_op_sm70.h
-- Installing: /usr/local/include/cutlass/gemm/warp/mma_complex_tensor_op_tile_iterator_sm80.h
-- Installing: /usr/local/include/cutlass/gemm/warp/mma_with_reduction_tensor_op.h
-- Installing: /usr/local/include/cutlass/gemm/warp/mma_tensor_op_tile_iterator_wmma.h
-- Installing: /usr/local/include/cutlass/gemm/warp/mma_tensor_op_fast_f32.h
-- Installing: /usr/local/include/cutlass/gemm/warp/mma.h
-- Installing: /usr/local/include/cutlass/gemm/warp/mma_simt.h
-- Installing: /usr/local/include/cutlass/gemm/warp/scale_bias_tile_iterator.h
-- Installing: /usr/local/include/cutlass/gemm/warp/default_mma_sparse_tensor_op.h
-- Up-to-date: /usr/local/include/cutlass/gemm/device
-- Installing: /usr/local/include/cutlass/gemm/device/gemm_grouped.h
-- Installing: /usr/local/include/cutlass/gemm/device/gemm_sparse.h
-- Installing: /usr/local/include/cutlass/gemm/device/gemm_universal_streamk_with_broadcast.h
-- Installing: /usr/local/include/cutlass/gemm/device/rank_2k.h
-- Installing: /usr/local/include/cutlass/gemm/device/gemm_universal_base.h
-- Installing: /usr/local/include/cutlass/gemm/device/gemm.h
-- Installing: /usr/local/include/cutlass/gemm/device/rank_k.h
-- Installing: /usr/local/include/cutlass/gemm/device/gemm_batched.h
-- Installing: /usr/local/include/cutlass/gemm/device/gemm_universal_adapter.h
-- Installing: /usr/local/include/cutlass/gemm/device/gemv.h
-- Installing: /usr/local/include/cutlass/gemm/device/default_gemm_configuration.h
-- Installing: /usr/local/include/cutlass/gemm/device/gemm_array.h
-- Installing: /usr/local/include/cutlass/gemm/device/gemm_universal_with_absmax.h
-- Installing: /usr/local/include/cutlass/gemm/device/gemm_sparse_universal.h
-- Installing: /usr/local/include/cutlass/gemm/device/gemm_complex.h
-- Installing: /usr/local/include/cutlass/gemm/device/gemm_universal.h
-- Installing: /usr/local/include/cutlass/gemm/device/trmm.h
-- Installing: /usr/local/include/cutlass/gemm/device/gemm_sparse_with_visitor.h
-- Installing: /usr/local/include/cutlass/gemm/device/gemm_sparse_with_absmax.h
-- Installing: /usr/local/include/cutlass/gemm/device/rank_2k_grouped.h
-- Installing: /usr/local/include/cutlass/gemm/device/base_grouped.h
-- Installing: /usr/local/include/cutlass/gemm/device/symm.h
-- Installing: /usr/local/include/cutlass/gemm/device/gemm_layernorm_mainloop_fusion.h
-- Installing: /usr/local/include/cutlass/gemm/device/gemm_sparse_universal_with_absmax.h
-- Installing: /usr/local/include/cutlass/gemm/device/ell_gemm.h
-- Installing: /usr/local/include/cutlass/gemm/device/gemm_with_k_reduction.h
-- Installing: /usr/local/include/cutlass/gemm/device/gemm_splitk_parallel.h
-- Installing: /usr/local/include/cutlass/gemm/device/gemm_universal_with_broadcast.h
-- Up-to-date: /usr/local/include/cutlass/gemm/thread
-- Installing: /usr/local/include/cutlass/gemm/thread/mma_sm61.h
-- Installing: /usr/local/include/cutlass/gemm/thread/mma_sm50.h
-- Installing: /usr/local/include/cutlass/gemm/thread/mma_sm60.h
-- Installing: /usr/local/include/cutlass/gemm/thread/mma.h
-- Up-to-date: /usr/local/include/cutlass/gemm/kernel
-- Installing: /usr/local/include/cutlass/gemm/kernel/gemm_grouped.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/sm90_gemm_tma.hpp
-- Installing: /usr/local/include/cutlass/gemm/kernel/default_gemm_layernorm_mainloop_fusion.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/rank_2k_grouped_problem_visitor.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/default_rank_k_complex.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/default_rank_2k_complex.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/default_gemm_with_absmax.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/default_gemm_universal_with_visitor.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized_mma_transform.hpp
-- Installing: /usr/local/include/cutlass/gemm/kernel/default_gemm.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/default_gemm_sparse_universal_with_absmax.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/gemm_planar_complex.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/sm90_tile_scheduler_group.hpp
-- Installing: /usr/local/include/cutlass/gemm/kernel/default_gemm_grouped_per_group_scale.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/sparse_gemm_with_visitor.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/default_gemm_planar_complex_universal.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/gemm.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized_input_transform.hpp
-- Installing: /usr/local/include/cutlass/gemm/kernel/gemm_grouped_per_group_scale.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/gemv_batched_strided.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/sm100_tile_scheduler_stream_k.hpp
-- Installing: /usr/local/include/cutlass/gemm/kernel/sm100_sparse_gemm_tma_warpspecialized.hpp
-- Installing: /usr/local/include/cutlass/gemm/kernel/gemm_batched.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/default_gemm_grouped.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/sm100_gemm_array_tma_warpspecialized_mma_transform.hpp
-- Installing: /usr/local/include/cutlass/gemm/kernel/params_universal_base.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/gemv.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/trmm_universal.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/default_rank_k_universal.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/default_trmm.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/default_gemm_sparse_universal.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/sm90_gemm_warpspecialized.hpp
-- Installing: /usr/local/include/cutlass/gemm/kernel/gemm_grouped_problem_visitor.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/default_trmm_universal.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/default_gemm_complex.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/default_ell_gemm.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/gemm_array.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/sm100_static_tile_scheduler.hpp
-- Installing: /usr/local/include/cutlass/gemm/kernel/gemm_sparse_universal.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/default_symm_complex.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized.hpp
-- Installing: /usr/local/include/cutlass/gemm/kernel/gemm_universal.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/default_rank_2k_universal.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/gemm_universal_streamk.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/gemm_streamk_with_fused_epilogue.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/sm90_gemm_array_tma_warpspecialized_pingpong.hpp
-- Installing: /usr/local/include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_cooperative.hpp
-- Installing: /usr/local/include/cutlass/gemm/kernel/default_rank_2k_grouped.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/gemm_planar_complex_array.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/default_gemv.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/tile_scheduler.hpp
-- Installing: /usr/local/include/cutlass/gemm/kernel/gemm_universal.hpp
-- Installing: /usr/local/include/cutlass/gemm/kernel/default_gemm_universal.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/default_gemm_streamk_with_broadcast.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/sm90_gemm_array_tma_warpspecialized_cooperative.hpp
-- Installing: /usr/local/include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp
-- Installing: /usr/local/include/cutlass/gemm/kernel/default_gemm_with_broadcast.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/default_gemm_sparse.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/sm70_gemm.hpp
-- Installing: /usr/local/include/cutlass/gemm/kernel/rank_2k_universal.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/rank_2k_transpose_operands.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/sm100_gemm_array_tma_warpspecialized_input_transform.hpp
-- Installing: /usr/local/include/cutlass/gemm/kernel/tile_scheduler_params.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp
-- Installing: /usr/local/include/cutlass/gemm/kernel/rank_2k_grouped.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/params_sparse_base.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/sm90_gemm_warpspecialized_pingpong.hpp
-- Installing: /usr/local/include/cutlass/gemm/kernel/default_rank_2k.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/default_gemm_splitk_parallel.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/default_gemm_grouped_softmax_mainloop_fusion.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/gemm_grouped_softmax_mainloop_fusion.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/sm90_gemm_warpspecialized_cooperative.hpp
-- Installing: /usr/local/include/cutlass/gemm/kernel/gemm_universal_decl.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/sm90_tile_scheduler.hpp
-- Installing: /usr/local/include/cutlass/gemm/kernel/gemm_transpose_operands.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/gemm_with_absmax.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/default_gemm_with_reduction.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/grouped_problem_visitor.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/default_trmm_complex.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/gemm_universal_with_visitor.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/gemm_with_fused_epilogue.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/sm100_gemm_array_tma_warpspecialized.hpp
-- Installing: /usr/local/include/cutlass/gemm/kernel/symm_universal.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/static_tile_scheduler.hpp
-- Installing: /usr/local/include/cutlass/gemm/kernel/gemm_layernorm_mainloop_fusion.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/gemm_universal_with_visitor_streamk.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/default_gemm_with_k_reduction.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/gemm_pipelined.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/sm90_tile_scheduler_stream_k.hpp
-- Installing: /usr/local/include/cutlass/gemm/kernel/default_rank_k.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/default_symm_universal.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/sm100_tile_scheduler.hpp
-- Installing: /usr/local/include/cutlass/gemm/kernel/gemm_sparse_universal_with_absmax.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/ell_gemm.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/gemm_with_k_reduction.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/sm100_tile_scheduler_group.hpp
-- Installing: /usr/local/include/cutlass/gemm/kernel/sparse_gemm_with_absmax.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/default_gemm_sparse_with_absmax.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/sparse_gemm.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/sm120_gemm_tma_warpspecialized_cooperative_asymmetric_dma.hpp
-- Installing: /usr/local/include/cutlass/gemm/kernel/gemm_params.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/rank_k_universal.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/gemm_splitk_parallel.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/default_gemm_sparse_with_visitor.h
-- Installing: /usr/local/include/cutlass/gemm/kernel/tile_scheduler_detail.hpp
-- Installing: /usr/local/include/cutlass/gemm/kernel/default_symm.h
-- Installing: /usr/local/include/cutlass/gemm/dispatch_policy.hpp
-- Up-to-date: /usr/local/include/cutlass/epilogue
-- Up-to-date: /usr/local/include/cutlass/epilogue/fusion
-- Installing: /usr/local/include/cutlass/epilogue/fusion/sm100_visitor_store_tma_warpspecialized.hpp
-- Installing: /usr/local/include/cutlass/epilogue/fusion/sm90_visitor_tma_warpspecialized.hpp
-- Installing: /usr/local/include/cutlass/epilogue/fusion/sm120_callbacks_tma_warpspecialized.hpp
-- Installing: /usr/local/include/cutlass/epilogue/fusion/callbacks.hpp
-- Installing: /usr/local/include/cutlass/epilogue/fusion/sm90_visitor_load_tma_warpspecialized.hpp
-- Installing: /usr/local/include/cutlass/epilogue/fusion/sm100_callbacks_tma_warpspecialized.hpp
-- Installing: /usr/local/include/cutlass/epilogue/fusion/sm100_visitor_compute_tma_warpspecialized.hpp
-- Installing: /usr/local/include/cutlass/epilogue/fusion/sm90_visitor_topk_softmax.hpp
-- Installing: /usr/local/include/cutlass/epilogue/fusion/operations.hpp
-- Installing: /usr/local/include/cutlass/epilogue/fusion/sm90_visitor_store_tma_warpspecialized.hpp
-- Installing: /usr/local/include/cutlass/epilogue/fusion/sm90_visitor_compute_tma_warpspecialized.hpp
-- Installing: /usr/local/include/cutlass/epilogue/fusion/sm90_callbacks_tma_warpspecialized.hpp
-- Installing: /usr/local/include/cutlass/epilogue/fusion/sm120_visitor_store_tma_warpspecialized.hpp
-- Up-to-date: /usr/local/include/cutlass/epilogue/threadblock
-- Up-to-date: /usr/local/include/cutlass/epilogue/threadblock/fusion
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/fusion/visitor_2x.hpp
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/fusion/visitor_store.hpp
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/fusion/visitor_compute.hpp
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/fusion/visitor_load.hpp
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/fusion/visitors.hpp
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/default_epilogue_volta_tensor_op.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/shared_load_iterator_mixed.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/epilogue_workspace.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/epilogue_with_absmax.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/epilogue_gemm_k_reduction.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/default_epilogue_with_broadcast.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/predicated_tile_iterator_affine_layout_params.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/output_tile_thread_map.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/epilogue_base.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/epilogue_planar_complex.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/epilogue_visitor_with_softmax.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/default_thread_map_volta_tensor_op.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/epilogue_streamk_with_broadcast.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/epilogue_with_broadcast.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/default_epilogue_direct_store.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/epilogue_with_visitor_callbacks.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/default_thread_map_tensor_op.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/default_epilogue_with_absmax.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/epilogue_direct_store.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/default_epilogue_planar_complex.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/shared_load_iterator.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/predicated_tile_iterator_blas3.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/epilogue_base_streamk.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/epilogue_smem_accumulator.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/predicated_tile_iterator_conv.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/shared_load_iterator_pitch_linear.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/predicated_tile_iterator_params.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/epilogue_depthwise.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/default_epilogue_tensor_op.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/epilogue_with_reduction.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/epilogue_with_visitor.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/predicated_tile_iterator_predicates.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/predicated_tile_iterator_strided_dgrad.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/default_epilogue_complex_tensor_op.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/interleaved_epilogue.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/predicated_tile_iterator_direct_conv.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/default_epilogue_complex_tensor_op_blas3.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/default_thread_map_wmma_tensor_op.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/default_epilogue_tensor_op_blas3.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/default_epilogue_wmma_tensor_op.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/default_epilogue_simt.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/direct_store_epilogue_iterator.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/predicated_tile_iterator_affine.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/default_thread_map_simt.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/predicated_tile_iterator.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/default_epilogue_with_reduction.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/output_iterator_parameter.h
-- Installing: /usr/local/include/cutlass/epilogue/threadblock/epilogue.h
-- Up-to-date: /usr/local/include/cutlass/epilogue/collective
-- Installing: /usr/local/include/cutlass/epilogue/collective/epilogue_tensor_broadcast.hpp
-- Installing: /usr/local/include/cutlass/epilogue/collective/sm90_epilogue_array_tma_warpspecialized.hpp
-- Installing: /usr/local/include/cutlass/epilogue/collective/sm70_epilogue_vectorized_array.hpp
-- Installing: /usr/local/include/cutlass/epilogue/collective/sm100_epilogue_array_nosmem.hpp
-- Installing: /usr/local/include/cutlass/epilogue/collective/default_epilogue.hpp
-- Up-to-date: /usr/local/include/cutlass/epilogue/collective/builders
-- Installing: /usr/local/include/cutlass/epilogue/collective/builders/sm90_builder.inl
-- Installing: /usr/local/include/cutlass/epilogue/collective/builders/sm90_common.inl
-- Installing: /usr/local/include/cutlass/epilogue/collective/builders/sm100_builder.inl
-- Installing: /usr/local/include/cutlass/epilogue/collective/builders/sm120_builder.inl
-- Installing: /usr/local/include/cutlass/epilogue/collective/builders/sm120_common.inl
-- Installing: /usr/local/include/cutlass/epilogue/collective/default_epilogue_array.hpp
-- Installing: /usr/local/include/cutlass/epilogue/collective/sm90_epilogue_tma_warpspecialized_bias_elementwise.hpp
-- Installing: /usr/local/include/cutlass/epilogue/collective/sm100_epilogue_array_tma_warpspecialized.hpp
-- Installing: /usr/local/include/cutlass/epilogue/collective/sm100_epilogue_nosmem.hpp
-- Installing: /usr/local/include/cutlass/epilogue/collective/collective_builder.hpp
-- Installing: /usr/local/include/cutlass/epilogue/collective/detail.hpp
-- Installing: /usr/local/include/cutlass/epilogue/collective/collective_epilogue.hpp
-- Installing: /usr/local/include/cutlass/epilogue/collective/sm100_epilogue_tma_warpspecialized.hpp
-- Installing: /usr/local/include/cutlass/epilogue/collective/sm70_epilogue_vectorized.hpp
-- Installing: /usr/local/include/cutlass/epilogue/collective/sm90_epilogue_tma_warpspecialized.hpp
-- Up-to-date: /usr/local/include/cutlass/epilogue/warp
-- Installing: /usr/local/include/cutlass/epilogue/warp/fragment_iterator_complex_tensor_op.h
-- Installing: /usr/local/include/cutlass/epilogue/warp/fragment_iterator_volta_tensor_op.h
-- Installing: /usr/local/include/cutlass/epilogue/warp/tensor_op_policy.h
-- Installing: /usr/local/include/cutlass/epilogue/warp/wmma_tensor_op_policy.h
-- Installing: /usr/local/include/cutlass/epilogue/warp/tile_iterator_wmma_tensor_op.h
-- Installing: /usr/local/include/cutlass/epilogue/warp/fragment_iterator_tensor_op.h
-- Installing: /usr/local/include/cutlass/epilogue/warp/tile_iterator_tensor_op.h
-- Installing: /usr/local/include/cutlass/epilogue/warp/fragment_iterator_wmma_tensor_op.h
-- Installing: /usr/local/include/cutlass/epilogue/warp/tile_iterator_tensor_op_mixed.h
-- Installing: /usr/local/include/cutlass/epilogue/warp/fragment_iterator_simt.h
-- Installing: /usr/local/include/cutlass/epilogue/warp/tile_iterator_simt.h
-- Installing: /usr/local/include/cutlass/epilogue/warp/volta_tensor_op_policy.h
-- Installing: /usr/local/include/cutlass/epilogue/warp/simt_policy.h
-- Installing: /usr/local/include/cutlass/epilogue/warp/fragment_iterator_gaussian_complex_tensor_op.h
-- Installing: /usr/local/include/cutlass/epilogue/warp/tile_iterator_volta_tensor_op.h
-- Up-to-date: /usr/local/include/cutlass/epilogue/thread
-- Installing: /usr/local/include/cutlass/epilogue/thread/conversion_op.h
-- Installing: /usr/local/include/cutlass/epilogue/thread/linear_combination_clamp.h
-- Installing: /usr/local/include/cutlass/epilogue/thread/linear_combination_relu0.h
-- Installing: /usr/local/include/cutlass/epilogue/thread/linear_combination_params.h
-- Installing: /usr/local/include/cutlass/epilogue/thread/linear_combination_leaky_relu.h
-- Installing: /usr/local/include/cutlass/epilogue/thread/linear_combination_generic_with_scaling.h
-- Installing: /usr/local/include/cutlass/epilogue/thread/linear_combination.h
-- Installing: /usr/local/include/cutlass/epilogue/thread/linear_combination_gelu.h
-- Installing: /usr/local/include/cutlass/epilogue/thread/linear_combination_bias_elementwise.h
-- Installing: /usr/local/include/cutlass/epilogue/thread/linear_combination_relu.h
-- Installing: /usr/local/include/cutlass/epilogue/thread/scale_type.h
-- Installing: /usr/local/include/cutlass/epilogue/thread/linear_combination_sigmoid.h
-- Installing: /usr/local/include/cutlass/epilogue/thread/linear_combination_planar_complex.h
-- Installing: /usr/local/include/cutlass/epilogue/thread/linear_combination_residual_block.h
-- Installing: /usr/local/include/cutlass/epilogue/thread/linear_combination_with_elementwise.h
-- Installing: /usr/local/include/cutlass/epilogue/thread/linear_combination_tensor_broadcast.hpp
-- Installing: /usr/local/include/cutlass/epilogue/thread/reduction_op.h
-- Installing: /usr/local/include/cutlass/epilogue/thread/linear_combination_hardswish.h
-- Installing: /usr/local/include/cutlass/epilogue/thread/linear_combination_silu.h
-- Installing: /usr/local/include/cutlass/epilogue/thread/activation.h
-- Installing: /usr/local/include/cutlass/epilogue/thread/linear_combination_bias_relu.h
-- Installing: /usr/local/include/cutlass/epilogue/thread/linear_combination_dgelu.h
-- Installing: /usr/local/include/cutlass/epilogue/thread/linear_combination_drelu.h
-- Installing: /usr/local/include/cutlass/epilogue/thread/linear_combination_generic.h
-- Installing: /usr/local/include/cutlass/epilogue/thread/detail.hpp
-- Installing: /usr/local/include/cutlass/epilogue/dispatch_policy.hpp
-- Installing: /usr/local/include/cutlass/workspace.h
-- Installing: /usr/local/include/cutlass/constants.h
-- Up-to-date: /usr/local/include/cutlass/detail
-- Installing: /usr/local/include/cutlass/detail/mainloop_fusion_helper_scale_factor.hpp
-- Installing: /usr/local/include/cutlass/detail/cluster.hpp
-- Installing: /usr/local/include/cutlass/detail/collective.hpp
-- Installing: /usr/local/include/cutlass/detail/dependent_false.hpp
-- Up-to-date: /usr/local/include/cutlass/detail/collective
-- Installing: /usr/local/include/cutlass/detail/collective/mixed_input_utils.hpp
-- Installing: /usr/local/include/cutlass/detail/mma.hpp
-- Installing: /usr/local/include/cutlass/detail/blockwise_scale_layout.hpp
-- Installing: /usr/local/include/cutlass/detail/sm100_tmem_helper.hpp
-- Installing: /usr/local/include/cutlass/detail/layout.hpp
-- Installing: /usr/local/include/cutlass/detail/sm100_blockscaled_layout.hpp
-- Installing: /usr/local/include/cutlass/detail/helper_macros.hpp
-- Installing: /usr/local/include/cutlass/trace.h
-- Installing: /usr/local/include/cutlass/gemm_coord.h
-- Installing: /usr/local/include/cutlass/bfloat16.h
-- Up-to-date: /usr/local/include/cutlass/reduction
-- Installing: /usr/local/include/cutlass/reduction/threadblock_swizzle.h
-- Up-to-date: /usr/local/include/cutlass/reduction/device
-- Installing: /usr/local/include/cutlass/reduction/device/tensor_reduce_affine_strided.h
-- Installing: /usr/local/include/cutlass/reduction/device/reduce_split_k.h
-- Installing: /usr/local/include/cutlass/reduction/device/tensor_reduce_affine_contiguous.h
-- Installing: /usr/local/include/cutlass/reduction/device/tensor_reduce.h
-- Up-to-date: /usr/local/include/cutlass/reduction/thread
-- Installing: /usr/local/include/cutlass/reduction/thread/reduction_operators.h
-- Installing: /usr/local/include/cutlass/reduction/thread/reduce.h
-- Up-to-date: /usr/local/include/cutlass/reduction/kernel
-- Installing: /usr/local/include/cutlass/reduction/kernel/tensor_reduce_affine_strided.h
-- Installing: /usr/local/include/cutlass/reduction/kernel/reduce_softmax_final.h
-- Installing: /usr/local/include/cutlass/reduction/kernel/reduce_split_k.h
-- Installing: /usr/local/include/cutlass/reduction/kernel/tensor_reduce_affine_contiguous.h
-- Up-to-date: /usr/local/include/cutlass/arch
-- Installing: /usr/local/include/cutlass/arch/reg_reconfig.h
-- Installing: /usr/local/include/cutlass/arch/mma_sm61.h
-- Installing: /usr/local/include/cutlass/arch/wmma_sm75.h
-- Installing: /usr/local/include/cutlass/arch/mma_sm75.h
-- Installing: /usr/local/include/cutlass/arch/mma_sm50.h
-- Installing: /usr/local/include/cutlass/arch/mma_sparse_sm80.h
-- Installing: /usr/local/include/cutlass/arch/barrier.h
-- Installing: /usr/local/include/cutlass/arch/synclog.hpp
-- Installing: /usr/local/include/cutlass/arch/config.h
-- Installing: /usr/local/include/cutlass/arch/grid_dependency_control.h
-- Installing: /usr/local/include/cutlass/arch/wmma_sm72.h
-- Installing: /usr/local/include/cutlass/arch/memory_sm80.h
-- Installing: /usr/local/include/cutlass/arch/mma_sparse_sm89.h
-- Installing: /usr/local/include/cutlass/arch/simd_sm60.h
-- Installing: /usr/local/include/cutlass/arch/cache_operation.h
-- Installing: /usr/local/include/cutlass/arch/mma_sm80.h
-- Installing: /usr/local/include/cutlass/arch/simd.h
-- Installing: /usr/local/include/cutlass/arch/mma_sm90.h
-- Installing: /usr/local/include/cutlass/arch/simd_sm61.h
-- Installing: /usr/local/include/cutlass/arch/memory.h
-- Installing: /usr/local/include/cutlass/arch/wmma_sm70.h
-- Installing: /usr/local/include/cutlass/arch/mma_sm60.h
-- Installing: /usr/local/include/cutlass/arch/mma.h
-- Installing: /usr/local/include/cutlass/arch/wmma.h
-- Installing: /usr/local/include/cutlass/arch/mma_sm89.h
-- Installing: /usr/local/include/cutlass/arch/memory_sm75.h
-- Installing: /usr/local/include/cutlass/arch/mma_sm70.h
-- Installing: /usr/local/include/cutlass/arch/arch.h
-- Installing: /usr/local/include/cutlass/array.h
-- Installing: /usr/local/include/cutlass/predicate_vector.h
-- Installing: /usr/local/include/cutlass/tensor_view.h
-- Installing: /usr/local/include/cutlass/subbyte_reference.h
-- Installing: /usr/local/include/cutlass/wmma_array.h
-- Installing: /usr/local/include/cutlass/float_subbyte.h
-- Installing: /usr/local/include/cutlass/tfloat32.h
-- Installing: /usr/local/include/cutlass/core_io.h
-- Installing: /usr/local/include/cutlass/kernel_hardware_info.hpp
-- Up-to-date: /usr/local/include/cutlass/thread
-- Installing: /usr/local/include/cutlass/thread/matrix.h
-- Up-to-date: /usr/local/include/cutlass/transform
-- Up-to-date: /usr/local/include/cutlass/transform/threadblock
-- Installing: /usr/local/include/cutlass/transform/threadblock/regular_scale_bias_vector_access_iterator.h
-- Installing: /usr/local/include/cutlass/transform/threadblock/regular_tile_access_iterator_tensor_op_sm80.h
-- Installing: /usr/local/include/cutlass/transform/threadblock/predicated_tile_access_iterator.h
-- Installing: /usr/local/include/cutlass/transform/threadblock/regular_tile_iterator_pitch_linear.h
-- Installing: /usr/local/include/cutlass/transform/threadblock/regular_tile_iterator_tensor_op_sm70.h
-- Installing: /usr/local/include/cutlass/transform/threadblock/predicated_tile_iterator_2dthreadtile.h
-- Installing: /usr/local/include/cutlass/transform/threadblock/regular_tile_iterator_tensor_op.h
-- Installing: /usr/local/include/cutlass/transform/threadblock/predicated_tile_access_iterator_params.h
-- Installing: /usr/local/include/cutlass/transform/threadblock/ell_predicated_tile_access_iterator.h
-- Installing: /usr/local/include/cutlass/transform/threadblock/regular_tile_access_iterator_tensor_op.h
-- Installing: /usr/local/include/cutlass/transform/threadblock/regular_tile_access_iterator_pitch_linear.h
-- Installing: /usr/local/include/cutlass/transform/threadblock/regular_tile_access_iterator_pitch_linear_direct_conv.h
-- Installing: /usr/local/include/cutlass/transform/threadblock/regular_tile_access_iterator.h
-- Installing: /usr/local/include/cutlass/transform/threadblock/ell_iterator.h
-- Installing: /usr/local/include/cutlass/transform/threadblock/regular_tile_iterator_pitch_linear_2dthreadtile.h
-- Installing: /usr/local/include/cutlass/transform/threadblock/predicated_tile_iterator_triangular_matrix.h
-- Installing: /usr/local/include/cutlass/transform/threadblock/predicated_vector_access_iterator.h
-- Installing: /usr/local/include/cutlass/transform/threadblock/regular_tile_iterator.h
-- Installing: /usr/local/include/cutlass/transform/threadblock/predicated_tile_access_iterator_triangular_matrix.h
-- Installing: /usr/local/include/cutlass/transform/threadblock/predicated_scale_bias_vector_iterator.h
-- Installing: /usr/local/include/cutlass/transform/threadblock/predicated_tile_iterator.h
-- Installing: /usr/local/include/cutlass/transform/threadblock/predicated_tile_access_iterator_2dthreadtile.h
-- Installing: /usr/local/include/cutlass/transform/threadblock/ell_predicated_tile_iterator.h
-- Installing: /usr/local/include/cutlass/transform/threadblock/vector_iterator.h
-- Installing: /usr/local/include/cutlass/transform/threadblock/predicated_scale_bias_vector_access_iterator.h
-- Up-to-date: /usr/local/include/cutlass/transform/collective
-- Installing: /usr/local/include/cutlass/transform/collective/sm90_wgmma_transpose.hpp
-- Installing: /usr/local/include/cutlass/transform/pitch_linear_thread_map.h
-- Up-to-date: /usr/local/include/cutlass/transform/warp
-- Installing: /usr/local/include/cutlass/transform/warp/vector_fragment_iterator.h
-- Up-to-date: /usr/local/include/cutlass/transform/device
-- Installing: /usr/local/include/cutlass/transform/device/transform_universal_adapter.hpp
-- Up-to-date: /usr/local/include/cutlass/transform/thread
-- Installing: /usr/local/include/cutlass/transform/thread/unary_op.h
-- Installing: /usr/local/include/cutlass/transform/thread/transpose.h
-- Up-to-date: /usr/local/include/cutlass/transform/kernel
-- Installing: /usr/local/include/cutlass/transform/kernel/sm90_sparse_gemm_compressor.hpp
-- Installing: /usr/local/include/cutlass/transform/kernel/sparse_gemm_compressor.hpp
-- Installing: /usr/local/include/cutlass/transform/kernel/filter_format_transformer.hpp
-- Installing: /usr/local/include/cutlass/array_planar_complex.h
-- Installing: /usr/local/include/cutlass/gemm_coord.hpp
-- Installing: /usr/local/include/cutlass/device_kernel.h
-- Installing: /usr/local/include/cutlass/cluster_launch.hpp
-- Installing: /usr/local/include/cutlass/complex.h
-- Installing: /usr/local/include/cutlass/aligned_buffer.h
-- Installing: /usr/local/include/cutlass/tensor_ref.h
-- Up-to-date: /usr/local/include
-- Up-to-date: /usr/local/include/cutlass
-- Installing: /usr/local/include/cutlass/version_extended.h
-- Up-to-date: /usr/local/test/cutlass
-- Up-to-date: /usr/local/test/cutlass/bin
-- Up-to-date: /usr/local/test/cutlass/lib64
-- Up-to-date: /usr/local/test/cutlass/ctest
-- Up-to-date: /usr/local/include
-- Up-to-date: /usr/local/include/cutlass
-- Up-to-date: /usr/local/include/cutlass/util
-- Up-to-date: /usr/local/include/cutlass/util/debug.h
-- Up-to-date: /usr/local/include/cutlass/util/mixed_dtype_utils.hpp
-- Up-to-date: /usr/local/include/cutlass/util/command_line.h
-- Up-to-date: /usr/local/include/cutlass/util/device_dump.h
-- Up-to-date: /usr/local/include/cutlass/util/device_nchw_to_nhwc.h
-- Up-to-date: /usr/local/include/cutlass/util/host_reorder.h
-- Up-to-date: /usr/local/include/cutlass/util/host_uncompress.h
-- Up-to-date: /usr/local/include/cutlass/util/exceptions.h
-- Up-to-date: /usr/local/include/cutlass/util/device_rmsnorm.h
-- Up-to-date: /usr/local/include/cutlass/util/helper_cuda.hpp
-- Up-to-date: /usr/local/include/cutlass/util/host_tensor.h
-- Up-to-date: /usr/local/include/cutlass/util/device_nhwc_padding.h
-- Up-to-date: /usr/local/include/cutlass/util/device_layernorm.h
-- Up-to-date: /usr/local/include/cutlass/util/device_nhwc_to_nchw.h
-- Up-to-date: /usr/local/include/cutlass/util/print_error.hpp
-- Up-to-date: /usr/local/include/cutlass/util/distribution.h
-- Up-to-date: /usr/local/include/cutlass/util/packed_stride.hpp
-- Up-to-date: /usr/local/include/cutlass/util/gett_commandline.hpp
-- Up-to-date: /usr/local/include/cutlass/util/device_nhwc_pooling.h
-- Up-to-date: /usr/local/include/cutlass/util/device_utils.h
-- Up-to-date: /usr/local/include/cutlass/util/host_tensor_planar_complex.h
-- Up-to-date: /usr/local/include/cutlass/util/tensor_view_io.h
-- Up-to-date: /usr/local/include/cutlass/util/device_groupnorm.h
-- Up-to-date: /usr/local/include/cutlass/util/cublas_wrappers.hpp
-- Up-to-date: /usr/local/include/cutlass/util/GPU_Clock.hpp
-- Up-to-date: /usr/local/include/cutlass/util/type_traits.h
-- Up-to-date: /usr/local/include/cutlass/util/device_memory.h
-- Up-to-date: /usr/local/include/cutlass/util/index_sequence.h
-- Up-to-date: /usr/local/include/cutlass/util/reference
-- Up-to-date: /usr/local/include/cutlass/util/reference/host
-- Up-to-date: /usr/local/include/cutlass/util/reference/host/rank_2k.h
-- Up-to-date: /usr/local/include/cutlass/util/reference/host/rank_k_complex.h
-- Up-to-date: /usr/local/include/cutlass/util/reference/host/gemm_planar_complex.h
-- Up-to-date: /usr/local/include/cutlass/util/reference/host/tensor_copy.h
-- Up-to-date: /usr/local/include/cutlass/util/reference/host/gemm.h
-- Up-to-date: /usr/local/include/cutlass/util/reference/host/gett.hpp
-- Up-to-date: /usr/local/include/cutlass/util/reference/host/convolution.h
-- Up-to-date: /usr/local/include/cutlass/util/reference/host/symm_complex.h
-- Up-to-date: /usr/local/include/cutlass/util/reference/host/tensor_compare.h
-- Up-to-date: /usr/local/include/cutlass/util/reference/host/tensor_norm.h
-- Up-to-date: /usr/local/include/cutlass/util/reference/host/gemm_complex.h
-- Up-to-date: /usr/local/include/cutlass/util/reference/host/tensor_foreach.h
-- Up-to-date: /usr/local/include/cutlass/util/reference/host/tensor_elementwise.h
-- Up-to-date: /usr/local/include/cutlass/util/reference/host/conv.hpp
-- Up-to-date: /usr/local/include/cutlass/util/reference/host/trmm.h
-- Up-to-date: /usr/local/include/cutlass/util/reference/host/trmm_complex.h
-- Up-to-date: /usr/local/include/cutlass/util/reference/host/tensor_reduce.hpp
-- Up-to-date: /usr/local/include/cutlass/util/reference/host/tensor_fill.hpp
-- Up-to-date: /usr/local/include/cutlass/util/reference/host/tensor_reduce.h
-- Up-to-date: /usr/local/include/cutlass/util/reference/host/tensor_compare.hpp
-- Up-to-date: /usr/local/include/cutlass/util/reference/host/tensor_fill.h
-- Up-to-date: /usr/local/include/cutlass/util/reference/host/symm.h
-- Up-to-date: /usr/local/include/cutlass/util/reference/host/error_metrics.h
-- Up-to-date: /usr/local/include/cutlass/util/reference/host/rank_2k_complex.h
-- Up-to-date: /usr/local/include/cutlass/util/reference/detail
-- Up-to-date: /usr/local/include/cutlass/util/reference/detail/linear_to_coordinate.h
-- Up-to-date: /usr/local/include/cutlass/util/reference/detail/inner_product.h
-- Up-to-date: /usr/local/include/cutlass/util/reference/device
-- Up-to-date: /usr/local/include/cutlass/util/reference/device/gemm_planar_complex.h
-- Up-to-date: /usr/local/include/cutlass/util/reference/device/gemm.h
-- Up-to-date: /usr/local/include/cutlass/util/reference/device/gett.hpp
-- Up-to-date: /usr/local/include/cutlass/util/reference/device/convolution.h
-- Up-to-date: /usr/local/include/cutlass/util/reference/device/tensor_compare.h
-- Up-to-date: /usr/local/include/cutlass/util/reference/device/gemm_complex.h
-- Up-to-date: /usr/local/include/cutlass/util/reference/device/tensor_foreach.h
-- Up-to-date: /usr/local/include/cutlass/util/reference/device/tensor_reduce.h
-- Up-to-date: /usr/local/include/cutlass/util/reference/device/tensor_fill.h
-- Up-to-date: /usr/local/include/cutlass/util/reference/device/thread
-- Up-to-date: /usr/local/include/cutlass/util/reference/device/thread/gemm.h
-- Up-to-date: /usr/local/include/cutlass/util/reference/device/rank_2k_complex.h
-- Up-to-date: /usr/local/include/cutlass/util/reference/device/tensor_relu.h
-- Up-to-date: /usr/local/include/cutlass/util/reference/device/kernel
-- Up-to-date: /usr/local/include/cutlass/util/reference/device/kernel/gemm.h
-- Up-to-date: /usr/local/include/cutlass/util/reference/device/kernel/tensor_foreach.h
-- Up-to-date: /usr/local/include/cutlass/util/reference/device/kernel/tensor_elementwise.h
-- Up-to-date: /usr/local/include
-- Up-to-date: /usr/local/include/cutlass
-- Up-to-date: /usr/local/include/cutlass/library
-- Up-to-date: /usr/local/include/cutlass/library/arch_mappings.h
-- Up-to-date: /usr/local/include/cutlass/library/library.h
-- Up-to-date: /usr/local/include/cutlass/library/manifest.h
-- Up-to-date: /usr/local/include/cutlass/library/types.h
-- Up-to-date: /usr/local/include/cutlass/library/singleton.h
-- Up-to-date: /usr/local/include/cutlass/library/descriptions.h
-- Up-to-date: /usr/local/include/cutlass/library/handle.h
-- Up-to-date: /usr/local/include/cutlass/library/util.h
-- Up-to-date: /usr/local/include/cutlass/library/operation_table.h
-- Installing: /usr/local/lib64/libcutlass.so
-- Installing: /usr/local/share/info/cutlass/generated_kernels.txt
-- Installing: /usr/local/bin/cutlass_profiler
-- Set non-toolchain portion of runtime path of "/usr/local/bin/cutlass_profiler" to ""
-- Installing: /usr/local/lib64/cmake/NvidiaCutlass/NvidiaCutlassConfig.cmake
-- Installing: /usr/local/lib64/cmake/NvidiaCutlass/NvidiaCutlassConfigVersion.cmake
-- Installing: /usr/local/lib64/cmake/NvidiaCutlass/NvidiaCutlassTargets.cmake
-- Installing: /usr/local/lib64/cmake/NvidiaCutlass/NvidiaCutlassTargets-release.cmake

@BwL1289
Copy link
Author

BwL1289 commented Jun 5, 2025

@d-k-b bumping. Would love to get this merged. Thank you

@BwL1289
Copy link
Author

BwL1289 commented Jun 16, 2025

@d-k-b anything else you'd like me to change?

@BwL1289
Copy link
Author

BwL1289 commented Jul 1, 2025

@d-k-b can we please get this merged? We are maintaining a number of NVIDIA forks.

@BwL1289
Copy link
Author

BwL1289 commented Jul 18, 2025

@d-k-b I have synced with upstream.

@d-k-b
Copy link
Collaborator

d-k-b commented Jul 18, 2025

Thanks for the ping, I've been on vacation and have been trying to catch up.

@BwL1289
Copy link
Author

BwL1289 commented Jul 18, 2025

Hope you enjoyed. Let me know if you need anything else from me to merge.

@BwL1289
Copy link
Author

BwL1289 commented Aug 4, 2025

@d-k-b bumping this

@BwL1289
Copy link
Author

BwL1289 commented Aug 12, 2025

@d-k-b what's the status here?

CC @thakkarV

@BwL1289
Copy link
Author

BwL1289 commented Aug 12, 2025

Also just updated float_subbyte.h for modern clang and gcc to avoid -Wdeprecated-literal-operator warning on Clang 16 that will soon become a hard error.

…ecated-literal-operator warning on Clang 16 that will soon become a hard error
…ral-operator warning on Clang 16 that will soon become a hard error
…l-operator warning on Clang 16 that will soon become a hard error
…operator warning on Clang 16 that will soon become a hard error
…ral-operator warning on Clang 16 that will soon become a hard error
@BwL1289
Copy link
Author

BwL1289 commented Aug 12, 2025

...and all other occurrences

@BwL1289 BwL1289 changed the title bwl1289/fix/cmake-build-fixes Fix CMake build error and -Wdeprecated-literal-operator warnings Aug 15, 2025
@d-k-b
Copy link
Collaborator

d-k-b commented Aug 21, 2025

Thanks for fixing that, I had started looking at the clang issues as well. This should go in very soon.

@BwL1289 BwL1289 requested a review from d-k-b September 24, 2025 20:59
@BwL1289
Copy link
Author

BwL1289 commented Sep 24, 2025

@d-k-b bumping this, again.

@BwL1289
Copy link
Author

BwL1289 commented Oct 8, 2025

@d-k-b is there a blocker here?

@BwL1289
Copy link
Author

BwL1289 commented Oct 27, 2025

@d-k-b see 4acbf65

@BwL1289
Copy link
Author

BwL1289 commented Nov 17, 2025

@d-k-b bumping this. Not sure what else is needed here.

@BwL1289
Copy link
Author

BwL1289 commented Dec 9, 2025

@d-k-b this has been open 6 months now. What else needs to be done here?

@d-k-b
Copy link
Collaborator

d-k-b commented Jan 6, 2026

@BwL1289 -- I missed your comments above, these changes should be in as of a month or so ago, can you pull the latest changes into your branch(es) and ensure all the changes made it? FYI, for some PRs, we push them internally so we can run more tests and then the changes come in with the releases. We try to go and note that on the PRs, but we miss them sometimes!

@BwL1289
Copy link
Author

BwL1289 commented Jan 12, 2026

@d-k-b not everything made it into main.

In CMakeLists.txt:

DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/NvidiaCutlass/ 

# should become 

DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/NvidiaCutlass

In tools/library/CMakeLists.txt:

DESTINATION ${CMAKE_INSTALL_INFODIR}/cutlass/ 

# should become 

DESTINATION ${CMAKE_INSTALL_INFODIR}/cutlass

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants