Skip to content

T5 fine-tuning for summarization decoder_input_ids and labels #15

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
marcoabrate opened this issue Oct 13, 2020 · 3 comments
Open

T5 fine-tuning for summarization decoder_input_ids and labels #15

marcoabrate opened this issue Oct 13, 2020 · 3 comments

Comments

@marcoabrate
Copy link

hello @abhimishra91

i was trying to implement the fine tuning of T5 as explained in your notebook.
in addition to have implemented the same structure as you, i have made some experiments with the HuggingFace Trainer class. the decoder_input_ids and labels parameters are not very clear to me. when you train the model, you do this

y = data['target_ids'].to(device, dtype = torch.long)
y_ids = y[:, :-1].contiguous()
lm_labels = y[:, 1:].clone().detach()
lm_labels[y[:, 1:] == tokenizer.pad_token_id] = -100

where y_ids is the decoder_input_ids. i don't understand why we need these preprocessing. i kindly ask you why are you skipping the last token of the target_ids, and why are you replacing the pads with -100 in the labels?
when i use the HuggingFace Trainer i need to tweak the __getitem__ function of the DataLoader like this

def __getitem__(self, idx):

    ...

    item['decoder_input_ids'] = y[:-1]
    lbl = y[:-1].clone()
    lbl[y[1:] == self.tokenizer.pad_token_id] = -100
    item['labels'] = lbl

    return item

otherwise the loss function does not decrease over time.

thank you for your help!

@Gorodecki
Copy link

Gorodecki commented Dec 28, 2020

IMG_20201228_185453_834
Hi, @marcoabrate!
I am also having trouble calculating loss. Can you share the full code for your training? Have you used multiGPU?

@marcoabrate
Copy link
Author

Hi @Gorodecki
I have abandoned this code since there are a lot of seq2seq training and testing examples in the HuggingFace library itself, you can check them out here: https://github.com/huggingface/transformers/tree/master/examples/seq2seq
I was not using multiGPU. Hope this help!

@QuetzalcoatlRosso
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants