You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems that head_dim 64 is not precompiled, the compiling takes around 60s for prefill + decode kernels. Is it possible to pre-compile head_dim 64 as well in the future? The compiling time is unfriendly for server scaling.
It seems that head_dim 64 is not precompiled, the compiling takes around 60s for prefill + decode kernels. Is it possible to pre-compile head_dim 64 as well in the future? The compiling time is unfriendly for server scaling.
28s for decode kernel
31s for prefill kernel
The text was updated successfully, but these errors were encountered: