Releases: NVIDIA/cutlass
Releases · NVIDIA/cutlass
CUTLASS 1.3.3
Final tagged release of CUTLASS 1.x branch.
CUTLASS 1.3.2
Performance enhancement for Volta Tensor Cores TN layout
- Fixed performance defect with indirect access to pointer array for Volta TensorCores TN arrangement.
CUTLASS 1.3.0
CUTLASS 1.3 adds efficient GEMM kernels targeting Volta Tensor Cores via mma.sync instruction added in CUDA 10.1.
CUTLASS 1.2
CUTLASS 1.2.0
(2018-10-26)
- Parallelized reductions across threadblocks ("Split-K")
- Improved IGEMM performance
- Batched strided WMMA GEMMs
CUTLASS 1.1
CUTLASS 1.1.0 release adds:
- Documentation
- Examples
- Turing Features
- Batched Strided GEMM
- Threadblock rasterization strategies
- Extended CUTLASS Core components
- Enhanced CUTLASS utilities
CUTLASS 1.0.1
CUTLASS 1.0.1.
Intra-threadblock reduction added for small threadblock tile sizes
- sgemm_64x128x16, sgemm_128x128x16, sgemm_128x64x16, sgemm_128x32x16, sgemm_64x64x16, sgemm_64x32x16
- igemm_32x32x128
- GEMM K residue handled during prologue prior to mainloop
Replaced Google Test copy with submodule. Use git submodule init
CUTLASS 1.0.0
CUTLASS v1.0.0
CUTLASS 0.1.1
Final patch of CUTLASS v0.1.
CUTLASS 0.1.0
CUTLASS initial release.