This repository was archived by the owner on Oct 31, 2023. It is now read-only.
This repository was archived by the owner on Oct 31, 2023. It is now read-only.
transformer multihead attention scaling layer error #108
Open
Description
Hi. I think there's an problem in transformer scaling layer.
When I run UNMT, got Exceptionerror in NMT/src/modules/multihead_attention.py line 97.
line 97 : q = self.scaling
line 30 : self.scaling = self.head_dim*-0.5
I could not find the reason.
So I just change my code to
line 97 : q = q / math.sqrt(self.head_dim)
and it worked.
Metadata
Metadata
Assignees
Labels
No labels