cuda-kernels CUDA kernel implementations and optimization notes. Based on: Programming Massively Parallel Processors (PMPP) LeetGPU practice Nsight Compute profiling