You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed the triton implementation of hstu attention support causal but when I tried to test its correctness, I found there is numerical error when is_causal=False. I also noticed the perf when is_causal=False is better than is_causal=True which does not make sense to me. So I am suspecting there may be some bug in triton implementation when is_causal=False.
Hi, thanks for looking into this and identifying this issue! These kernels are primarily tested/benchmarked/used with is_causal=True, and the other path is not tested as thoroughly. I would recommend sticking with is_causal=True (given equivalence) unless you have special scenarios in mind.
Hi
I noticed the triton implementation of hstu attention support
causal
but when I tried to test its correctness, I found there is numerical error whenis_causal=False
. I also noticed the perf whenis_causal=False
is better thanis_causal=True
which does not make sense to me. So I am suspecting there may be some bug in triton implementation whenis_causal=False
.Log
Reproduce step
The text was updated successfully, but these errors were encountered: