MUTLASS 0.1.1 - September 2024
MUTLASS(MUSA Templates for Linear Algebra Subroutines) is a header-only library for implementing high-performance matrix-matrix multiplication (GEMM) within MUSA(Meta-computing Unified System Architecture). It incorporates strategies for hierarchical decomposition and data movement similar to those used to implement muDNN.
See the Quick Start Guide to get started quickly.
Note: MUTLASS uses the CuTe library, introduced in CUTLASS 3.x, as the backend, and thus is incompatible with most implementations of CUTLASS 2.x.
MUTLASS 0.1.1 is an open-release version based on CUTLASS 3.5 providing:
-
MuTe, a core library and backend adapted from CUTLASS CuTe
-
Quyuan Features
- MMA primitives: TensorFloat32, BFloat16, Float16, INT8
-
FMA/MMA GEMM Kernels targeting the Quyuan architecture
- Note: this is a beta release. Further updates to MUTLASS will include performance improvements, feature enablement, and possible breaking changes to the API
-
MUTLASS Profiler, Library, and Utilities
-
Two examples that demonstrate the usage of the low-level API and the collective builders to build GEMM kernels
Minimum requirements:
-
Architecture: Quyuan
-
Compiler: MCC 3.1.0
-
MUSA Toolkit version: 3.1.0
- Quick Start Guide - build and run MUTLASS
MUTLASS is a header-only template library and does not need to be built to be used by other projects. Client applications should target MUTLASS's include/
directory in their include paths.
MUTLASS unit tests, examples, and utilities can be build with CMake. The minimum version of CMake is given in the QuickStart guide.
Create a build directory within the MUTLASS project, then run CMake. By default MUTLASS will build kernels for MUSA architecture version 2.2.