You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
i was trying to implement the fine tuning of T5 as explained in your notebook.
in addition to have implemented the same structure as you, i have made some experiments with the HuggingFace Trainer class. the decoder_input_ids and labels parameters are not very clear to me. when you train the model, you do this
where y_ids is the decoder_input_ids. i don't understand why we need these preprocessing. i kindly ask you why are you skipping the last token of the target_ids, and why are you replacing the pads with -100 in the labels?
when i use the HuggingFace Trainer i need to tweak the __getitem__ function of the DataLoader like this
hello @abhimishra91
i was trying to implement the fine tuning of T5 as explained in your notebook.
in addition to have implemented the same structure as you, i have made some experiments with the HuggingFace Trainer class. the
decoder_input_ids
andlabels
parameters are not very clear to me. when you train the model, you do thiswhere
y_ids
is thedecoder_input_ids
. i don't understand why we need these preprocessing. i kindly ask you why are you skipping the last token of thetarget_ids
, and why are you replacing the pads with -100 in thelabels
?when i use the HuggingFace Trainer i need to tweak the
__getitem__
function of the DataLoader like thisotherwise the loss function does not decrease over time.
thank you for your help!
The text was updated successfully, but these errors were encountered: