Skip to content

jiaau/kernels

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kernels

关注点

  • reduce

    • CUDA Warp-Level Primitives
    • Parallel reduction
  • transpose

    • Memory Coalescing
    • Shared Memory
    • Bank Conflict
    • Swizzling
    • CuTe
  • sgemm

    • Tile Size Tuning
    • Shared Memory
    • Bank Conflict
    • Double Buffer
    • Warp Divergence
    • Vectorized memory access

编译与运行

编译项目

make build
make install <kernel_name>

运行测试

make run <kernel_name>

使用NVIDIA Compute Profiler进行性能分析

make ncu <kernel_name>

清理构建文件

make clean

命令行选项

运行SGEMM测试时支持以下选项:

  • --bench: 启用基准测试模式
  • --times N: 指定基准测试迭代次数(默认:3)
  • --help: 显示帮助信息

例如:

make run <kernel_name> -- --bench --times 10

Acknowledgments

About

This repository showcases common optimization techniques for kernels.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published