Confused about position embedding in BERT

Hey!

I'm currently researching BERT and I'm a bit confused about the position embeddings. I've come across many articles, websites, and blogs, but they all seem to say different things. Some claim that BERT uses learnable position embeddings, while others suggest it uses sin/cosine functions like the original Transformer model. There are also some sources that mention multiple ways to construct positional embeddings. Does anyone have a clear explanation on this? Also, if it’s the learnable positional embeddings, can anyone recommend some useful articles on the topic? I’ve had trouble finding any solid references myself

Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Confused about position embedding in BERT #1427

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Confused about position embedding in BERT #1427

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions