-
Notifications
You must be signed in to change notification settings - Fork 920
Open
Description
Hi, my question is about blog article kv-cache.md
.
I noticed that in the section where authors visualize cached values, it is visible that matrix grows +1 both in rows and columns. However, Q, K, V matrices never grow to the right, and never introduce quadratic time complexity as their projection matrices W_q, W_k, W_v are of shapes (emb_d, head_d) which are independent of seq length.
I think the authors meant visualizing attention scores matrix, which is K.T@V
that indeed grows in O(seq_len²).
Please help me understand what I might be missing. Otherwise I can raise a small PR to address this.
Thanks.
Metadata
Metadata
Assignees
Labels
No labels