Unexpected behaviour with transformers versions above 4.28 for Donut

### System Info

Hello,

Big thanks to all the contributors on this repo! 

I would like to raise an issue, that was initially encountered when running example notebooks for Donut in Transformer Tutorials (https://github.com/NielsRogge/Transformers-Tutorials) by @NielsRogge . This is issue was previously raised on that repo, but the author advised to re-raise it here. Original issue: https://github.com/NielsRogge/Transformers-Tutorials/issues/496#issuecomment-2955991546

**Bug**:

The bug was encountered when trying to reproduce results from this notebook: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/Donut/CORD/Fine_tune_Donut_on_a_custom_dataset_(CORD)_with_PyTorch_Lightning.ipynb

When using newer versions of `transformers` there is strange behaviour during training, as the model shows much higher validation edit distance values than expected. This is fixed by downgrading to versions `4.28.1` or `4.25`. 

Reference code uses the following classes from `transformers`: 

- `DonutProcessor`
- `VisionEncoderDecoderModel`
- `VisionEncoderDecoderConfig`

The difference can be seen on the attached screenshot, where the red line shows validation edit distance metric when running on `4.28.1` and the orange one when running on `4.36.0`.

Was there any changes introduced after `4.28.1` that could be causing it, and are there any known ways of fixing them? 

<img width="3026" height="356" alt="Image" src="https://github.com/user-attachments/assets/6b209a3c-3dd9-4a3a-9b45-a339d381cac7" />

**Environment**

Output of `transformers env` for `4.28.1`:

```
- `transformers` version: 4.28.1
- Platform: Linux-6.1.134-152.225.amzn2023.x86_64-x86_64-with-glibc2.34
- Python version: 3.11.12
- Huggingface_hub version: 0.32.4
- Safetensors version: 0.5.3
- PyTorch version (GPU?): 2.7.1+cu128 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: YES
- Using distributed or parallel set-up in script?: NO
```

for `4.36.0` (version where issue is encountered):
```
- `transformers` version: 4.36.0
- Platform: Linux-6.1.134-152.225.amzn2023.x86_64-x86_64-with-glibc2.34
- Python version: 3.11.12
- Huggingface_hub version: 0.32.4
- Safetensors version: 0.5.3
- Accelerate version: not installed
- Accelerate config: not found
- PyTorch version (GPU?): 2.7.1+cu128 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: YES
- Using distributed or parallel set-up in script?: NO
```

Thank you for you time, and please let me know what I can do on my end to make it easier to diagnose the issue more precisely. 

### Who can help?

_No response_

### Information

- [ ] The official example scripts
- [x] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [x] My own task or dataset (give details below)

### Reproduction

The bug was encountered when trying to reproduce results from this notebook: 

https://github.com/NielsRogge/Transformers-Tutorials/blob/master/Donut/CORD/Fine_tune_Donut_on_a_custom_dataset_(CORD)_with_PyTorch_Lightning.ipynb

To reproduce: 

1. Follow the notebook as-is, this will install the latest version of transformers
2. Continue until the training step and run the training
3. Observe unexpectedly high validation edit distance metrics

To fix:

1. Pin the transformers version to `4.28.1`
2. Run the notebook again
3. You should observe a much lower validation edit distance metrics

### Expected behavior

I expect the training behaviour to be similar on newer versions of `transformers` and the performance not to degrade so drastically.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unexpected behaviour with transformers versions above 4.28 for Donut #39473

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unexpected behaviour with transformers versions above 4.28 for Donut #39473

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions