You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So the prediction is output_embeddings, and the supervision is supervision_ids[:, 1:] instead of target_id. However, the output_embeddings is computed using MultiHeadAttention rather than MaskedMultiHeadAttention. This means that the output_embeddings at time t can see the data after t. Will this be a problem?
The text was updated successfully, but these errors were encountered:
Hi, the ar_loss is compute as:
So the prediction is
output_embeddings
, and the supervision issupervision_ids[:, 1:]
instead oftarget_id
. However, theoutput_embeddings
is computed usingMultiHeadAttention
rather thanMaskedMultiHeadAttention
. This means that theoutput_embeddings
at timet
can see the data aftert
. Will this be a problem?The text was updated successfully, but these errors were encountered: