Open
Description
Hey!
I'm currently researching BERT and I'm a bit confused about the position embeddings. I've come across many articles, websites, and blogs, but they all seem to say different things. Some claim that BERT uses learnable position embeddings, while others suggest it uses sin/cosine functions like the original Transformer model. There are also some sources that mention multiple ways to construct positional embeddings. Does anyone have a clear explanation on this? Also, if it’s the learnable positional embeddings, can anyone recommend some useful articles on the topic? I’ve had trouble finding any solid references myself
Thanks in advance!
Metadata
Metadata
Assignees
Labels
No labels