transformer multihead attention scaling layer error 

Hi. I think there's an problem in transformer scaling layer. 
When I run UNMT, got Exceptionerror in NMT/src/modules/multihead_attention.py line 97. 

_line 97 : q *= self.scaling
line 30 : self.scaling = self.head_dim**-0.5_

I could not find the reason.
So I just change my code to 

_line 97 : q = q / math.sqrt(self.head_dim)_

and it worked.