100-daysofcuda

Kernels Written for 100 days of CUDA Challenge

Day 1: Wrote a Naive Matmul kernel

Day 2: Wrote a 1D tiled matmul kernel

Day 3: Oversimplified 2D tiled matmul kernel

Day 4: 2D matmul kernel with CPU verification and great memory management

Day 5: 3D matmul tiling!

Day 6: Softmax flashattention online

Day 7: Layer Normalization

Day 8: Better layer normalization with lots of nice modifications

Day 9: Implimented a 2D tiled convlutional kernel

Day 10: Implimented Vectorized matmul !! Got first Batch

Day 11: Implimented merge sort algorithm using CUDA

Day 12: Wrote flash attention with backpropagation

Day 13: Learned about GNNs and wrote Kernel to impliment a layer of GNNs

Day 14: Wrote N body simulation kernel inspired by 1y33

Day 15: Random weight selection kernel for neural networks

Day 16: Full Fledged Transformer model

Day 17: Optimized the transformer by adding tiled matmul and shared memory in softmax !

Day 18: More Optimizations like loop unrolling and tiled transposed matrix B and Warp level reductions in Softmax

Day 19: Learned and integrated FP16, Fusion kernels, Online softmax, registers etc in the transformer kernel

Day 20: Wrote a full neural network with forward pass, SGD, backpropagation

Day 21: Tried writing full CNN For mnist but failed lots lots and lots of time

Day 22: Shifted focus from Implimenting everything from scratch to cuDNN. Reimplimented the CNN Mnist project in cuDNN and managed to compile it with some logical errors still there

Day 23: Again refocused on Implimenting everything from scratch as cuDNN version was pretty difficult to maintain. Did huge progress by integrating everything in one file and almost reaching excellence in doing it

Day 24: Reimagined the whole code and after 10 hours of tinkering around with nested loops wrote the perfect CUDA CNN kernel for MNIST

Day 25: Implimented Native sparse attention Paper by deepseek for an nvidia H100 GPU

Day 26: Tokenizer in CUDA

Day 27: Learned how to do an operation for 1000s of GPUs using MPI and NCCL

Day 28: Wrote kernel for Full LLM (don't know how if it works perfect)

Day 29: LLM Kernel code review

Day 30: Rethought the whole LLM kernel and fit that into a single file

Day 31: Debugged the LLM kernel a lot

Day 32: Triton Day 1 - Vectoradd

Day 33: CUDA Q-Learning Implementation for a Grid World Environment

Day 34: CUDA-based agent-resource simulation

Day 35: Writing CUDA Kernels for AI generated Algorithms. Impliments backpropagation-free neural network architecture

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
day1		day1
day10		day10
day11		day11
day12		day12
day13		day13
day14		day14
day15		day15
day16		day16
day17		day17
day18		day18
day19		day19
day2		day2
day20		day20
day21		day21
day22		day22
day23		day23
day24		day24
day25		day25
day26		day26
day27		day27
day28		day28
day29		day29
day3		day3
day30		day30
day31		day31
day32		day32
day33		day33
day34		day34
day35		day35
day36		day36
day37		day37
day38		day38
day4		day4
day5		day5
day6		day6
day7		day7
day8		day8
day9		day9
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

100-daysofcuda

About

Uh oh!

Releases

Packages

Languages

License

prateekshukla1108/100-daysofcuda

Folders and files

Latest commit

History

Repository files navigation

100-daysofcuda

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages