Have a look at what attention looks like interpreted as a vector field:
Play with the QK-matrix!
Here is an example of a strong attractor:
How does changing the magnitude of the pre-softmax weights affect the field?
You can also copy paste one of the glsl files into fieldplay or direcly include one of my scrips by wiriting a line like this into the code field:
#include https://raw.githubusercontent.com/matthiasdellago/visualising-attention/main/attention.glsl