MUTLASS 0.2.0

MUTLASS 0.2.0 - February 2025

MUTLASS(MUSA Templates for Linear Algebra Subroutines) is a header-only library for implementing high-performance matrix-matrix multiplication (GEMM) within MUSA(Meta-computing Unified System Architecture). It incorporates strategies for hierarchical decomposition and data movement similar to those used to implement muDNN.

See the Quick Start Guide to get started quickly.

Note: MUTLASS uses the CuTe library, introduced in CUTLASS 3.x, as the backend, and thus is incompatible with most implementations of CUTLASS 2.x.

What's New in MUTLASS 0.2.0

MUTLASS 0.2.0 is an update to MUTLASS adding:

MP31 Features:
- Squad-level MMA(SQMMA) and Warp-level MMA primitives with rich data types (TF32/FP16/BF16/FP8/S8 etc.).
- Tensor Memory Engine(TME) and RobustBufferAccess primitives.
New GEMM mainloop and epilogue targeting MP31 architecture that achieve high performance with TME and SQMMA.
New tile scheduler to support CTA swizzle for MP31 kernels.
New experimental directory housing the implementations that are not yet stable and may have significant changes in the future.
- Prototype of Flash Attention Forward targeting MP31 architecture with TME, RobustBufferAccess and SQMMA.
New FP8 GEMM with groupwise scaling.
Upgrade the backend from CUTLASS/CuTe 3.5.0 to CUTLASS/CuTe 3.6.0.

Minimum requirements:

Architecture: Quyuan
Compiler: MCC 4.0.0
MUSA Toolkit version: 4.0.0

See the CHANGELOG for a detailed listing of releases and updates.

Performance

The above figure shows the relative performance of the tensorop GEMM compared with muDNN. The performance of TF32 data type be futher optimized in the next release.

Documentation

Quick Start Guide - build and run MUTLASS

Building MUTLASS

MUTLASS is a header-only template library and does not need to be built to be used by other projects. Client applications should target MUTLASS's include/ directory in their include paths.

MUTLASS unit tests, examples, and utilities can be build with CMake. The minimum version of CMake is given in the QuickStart guide.

Create a build directory within the MUTLASS project, then run CMake. By default MUTLASS will build kernels for MUSA architecture versions 2.2 and 3.1.

Name		Name	Last commit message	Last commit date
Latest commit History 504 Commits
.github		.github
cmake		cmake
examples		examples
experimental		experimental
include		include
media		media
python		python
test		test
tools		tools
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
MUSA.cmake		MUSA.cmake
README.md		README.md
README_CN.md		README_CN.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MUTLASS 0.2.0

What's New in MUTLASS 0.2.0

Performance

Documentation

Building MUTLASS

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

MooreThreads/mutlass

Folders and files

Latest commit

History

Repository files navigation

MUTLASS 0.2.0

What's New in MUTLASS 0.2.0

Performance

Documentation

Building MUTLASS

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages