Skip to content

A microbenchmark for GEMM kernels on NVIDIA GPUs with Ampere architecture

Notifications You must be signed in to change notification settings

jssonx/gemm-kernel-microbenchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GEMM Kernel Microbenchmark

This repo provides a microbenchmark for GEMM kernels on NVIDIA GPUs with Ampere Architecture (sm_80). It includes both a CUDA kernel benchmark and a Python extension benchmark.

Requirements

  • NVIDIA GPU with Ampere Architecture (sm_80)
  • CUDA 12.2

Getting Started

CUDA Kernel Benchmark

  1. Build the project:
$ make
  1. Run a benchmark with specific parameters:
$ ./csrc/bench/main --groups=16 --m=64 --n=64 --k=768 --iterations=3

Where:

  • --groups: Number of groups
  • --m, --n, --k: Problem size dimensions
  • --iterations: Number of iterations
  1. For more information on available options:
$ ./csrc/bench/main --help

Python Extension Benchmark

  1. Export the CUDA kernel as a Python extension:
$ python ./python/testbed/lib.py
$ cd out && TORCH_CUDA_ARCH_LIST="8.0" python setup.py install --user
  1. Run the benchmark:
$ python ./python/testbed/multi_gemm.py > perf.txt

References

About

A microbenchmark for GEMM kernels on NVIDIA GPUs with Ampere architecture

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published