You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for provided code of SMART.
SMART uses the following code to get the embeddings, which is then used to get noisy embeddings and feed bert as inputs_embeds.
But for inputs_embeds in transformers, it should be the output of bert.embeddings.word_embedding not bert.embeddings.
Please refer to the following code for BertEmbedding.forward:
defforward(
self, input_ids=None, token_type_ids=None, position_ids=None, inputs_embeds=None, past_key_values_length=0
):
ifinput_idsisnotNone:
input_shape=input_ids.size()
else:
input_shape=inputs_embeds.size()[:-1]
seq_length=input_shape[1]
ifposition_idsisNone:
position_ids=self.position_ids[:, past_key_values_length : seq_length+past_key_values_length]
# Setting the token_type_ids to the registered buffer in constructor where it is all zeros, which usually occurs# when its auto-generated, registered buffer helps users when tracing the model without passing token_type_ids, solves# issue #5664iftoken_type_idsisNone:
ifhasattr(self, "token_type_ids"):
buffered_token_type_ids=self.token_type_ids[:, :seq_length]
buffered_token_type_ids_expanded=buffered_token_type_ids.expand(input_shape[0], seq_length)
token_type_ids=buffered_token_type_ids_expandedelse:
token_type_ids=torch.zeros(input_shape, dtype=torch.long, device=self.position_ids.device)
ifinputs_embedsisNone:
inputs_embeds=self.word_embeddings(input_ids)
token_type_embeddings=self.token_type_embeddings(token_type_ids)
embeddings=inputs_embeds+token_type_embeddingsifself.position_embedding_type=="absolute":
position_embeddings=self.position_embeddings(position_ids)
embeddings+=position_embeddingsembeddings=self.LayerNorm(embeddings)
embeddings=self.dropout(embeddings)
returnembeddings
The text was updated successfully, but these errors were encountered:
Another problem is that the magnitude of delta_grad * self.step_size is too small to influence the noise during noise updating.
For example, delta_grad * self.step_size is around ~ 1e-13, but the magnitude of noise can be 1e-5.
So the code below seems to just update the noise by using its norm without using delta_grad.
Thank you for provided code of SMART.
SMART uses the following code to get the embeddings, which is then used to get noisy embeddings and feed bert as
inputs_embeds
.mt-dnn/mt_dnn/matcher.py
Line 124 in ca896ef
But for
inputs_embeds
in transformers, it should be the output ofbert.embeddings.word_embedding
notbert.embeddings
.Please refer to the following code for
BertEmbedding.forward
:The text was updated successfully, but these errors were encountered: