FOAK Cross Entropy Loss Will Not Work with New Loss Functions After Transformers 4.46

There has been significant refactoring of the loss functions for `transformers 4.46`, that will render the cross entropy patching ineffective. Need to have a different `ModelPatcherRule` for the new transformers version. CC: @anhuong 

https://github.com/huggingface/transformers/pull/34191


So now there are 3 possiblities
1. `custom_loss_function` is passed into `Trainer`
2. model has migrated to the `custom_loss_function` API
3. model has not migrated (like Granite now)

For 3. This is the easy one, because it means no code changes 

For 1. Im thinking we do not patch anything, because if a user wants to do this, we cant control what loss function they use

For 2. In this case we want to patch[ `fixed_cross_entropy` ](https://github.com/huggingface/transformers/blob/dbbc3ce85ff2d3432467cbc5dc39165ec8275363/src/transformers/loss/loss_utils.py#L24), but this should be done on a per-model basis. So we need to somehow have the model instantiate the loss function, e.g., `ForCausalLMLoss`, and only patch `fixed_cross_entropy` during this instantiation process, and put it back to original after it is done

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FOAK Cross Entropy Loss Will Not Work with New Loss Functions After Transformers 4.46 #98

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

FOAK Cross Entropy Loss Will Not Work with New Loss Functions After Transformers 4.46 #98

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions