Skip to content

Commit 49e641a

Browse files
author
yanjun.qiu
committed
misc: update submodule tools
1 parent e2038e5 commit 49e641a

File tree

3 files changed

+9
-8
lines changed

3 files changed

+9
-8
lines changed

Diff for: .dev/update_submodules.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,5 +4,5 @@ git submodule init
44
git submodule update --remote # update all submodule
55
# git submodule update --remote ffpa-attn-mma # only update ffpa-attn-mma
66
git add .
7-
git commit -m "Automated submodule update"
7+
git commit -m "misc: Automated submodule update"
88
set +x

Diff for: .github/.gitignore

+2-1
Original file line numberDiff line numberDiff line change
@@ -22,4 +22,5 @@ bin
2222
*.log
2323
*.txt
2424
*.tex
25-
tmp*
25+
tmp*
26+
pdfs

Diff for: kernels/hgemm/README.md

+6-6
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
![toy-hgemm-library](https://github.com/user-attachments/assets/962bda14-b494-4423-b8eb-775da9f5503d)
55

6-
[📖Toy-HGEMM Library⚡️⚡️](./kernels/hgemm) is a library that write many HGEMM kernels from scratch using Tensor Cores with WMMA, MMA PTX and CuTe API, thus, can achieve `98%~100%` performance of **cuBLAS**. The codes here are source from 📖[CUDA-Learn-Notes](https://github.com/DefTruth/CUDA-Learn-Notes) ![](https://img.shields.io/github/stars/DefTruth/CUDA-Learn-Notes.svg?style=social) and exported as a standalone library, please checkout [CUDA-Learn-Notes](https://github.com/DefTruth/CUDA-Learn-Notes) for latest updates. Welcome to 🌟👆🏻star this repo to support me, many thanks ~ 🎉🎉
6+
[📖Toy-HGEMM Library⚡️⚡️](./kernels/hgemm) is a library that write many HGEMM kernels from scratch using Tensor Cores with WMMA, MMA PTX and CuTe API, thus, can achieve `98%~100%` performance of **cuBLAS**. The codes here are source from 📖[CUDA-Learn-Notes](https://github.com/xlite-dev/CUDA-Learn-Notes) ![](https://img.shields.io/github/stars/xlite-dev/CUDA-Learn-Notes.svg?style=social) and exported as a standalone library, please checkout [CUDA-Learn-Notes](https://github.com/xlite-dev/CUDA-Learn-Notes) for latest updates. Welcome to 🌟👆🏻star this repo to support me, many thanks ~ 🎉🎉
77

88
<div id="hgemm-sgemm"></div>
99

@@ -27,11 +27,11 @@ Currently, on NVIDIA L20, RTX 4090 and RTX 3080 Laptop, compared with cuBLAS's d
2727
## ©️Citations🎉🎉
2828

2929
```BibTeX
30-
@misc{hgemm-mma@2024,
31-
title={hgemm-mma: Write HGEMM from scratch using Tensor Cores with WMMA, MMA PTX and CuTe API.},
32-
url={https://github.com/DefTruth/hgemm-mma},
33-
note={Open-source software available at https://github.com/DefTruth/hgemm-mma},
34-
author={DefTruth etc},
30+
@misc{hgemm-tensorcores-mma@2024,
31+
title={hgemm-tensorcores-mma: Write HGEMM from scratch using Tensor Cores with WMMA, MMA PTX and CuTe API.},
32+
url={https://github.com/xlite-dev/hgemm-tensorcores-mma},
33+
note={Open-source software available at https://github.com/xlite-dev/hgemm-tensorcores-mma},
34+
author={xlite-dev etc},
3535
year={2024}
3636
}
3737
```

0 commit comments

Comments
 (0)