What is your question?
I am currently using AOT-compiled CuTeDSL kernels through TVM-FFI.
However, is there a way to ensure that the compiled CUfunctions are loaded before the first kernel invocation?
More specifically, when working with precompiled CUBINs, we can explicitly load all required kernels upfront by calling cuModuleGetFunction.
Is there an equivalent mechanism for kernels compiled via CuTeDSL and accessed through TVM-FFI?