-
Notifications
You must be signed in to change notification settings - Fork 29.7k
Description
System Info
Hello,
Big thanks to all the contributors on this repo!
I would like to raise an issue, that was initially encountered when running example notebooks for Donut in Transformer Tutorials (https://github.com/NielsRogge/Transformers-Tutorials) by @NielsRogge . This is issue was previously raised on that repo, but the author advised to re-raise it here. Original issue: NielsRogge/Transformers-Tutorials#496 (comment)
Bug:
The bug was encountered when trying to reproduce results from this notebook: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/Donut/CORD/Fine_tune_Donut_on_a_custom_dataset_(CORD)_with_PyTorch_Lightning.ipynb
When using newer versions of transformers
there is strange behaviour during training, as the model shows much higher validation edit distance values than expected. This is fixed by downgrading to versions 4.28.1
or 4.25
.
Reference code uses the following classes from transformers
:
DonutProcessor
VisionEncoderDecoderModel
VisionEncoderDecoderConfig
The difference can be seen on the attached screenshot, where the red line shows validation edit distance metric when running on 4.28.1
and the orange one when running on 4.36.0
.
Was there any changes introduced after 4.28.1
that could be causing it, and are there any known ways of fixing them?

Environment
Output of transformers env
for 4.28.1
:
- `transformers` version: 4.28.1
- Platform: Linux-6.1.134-152.225.amzn2023.x86_64-x86_64-with-glibc2.34
- Python version: 3.11.12
- Huggingface_hub version: 0.32.4
- Safetensors version: 0.5.3
- PyTorch version (GPU?): 2.7.1+cu128 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: YES
- Using distributed or parallel set-up in script?: NO
for 4.36.0
(version where issue is encountered):
- `transformers` version: 4.36.0
- Platform: Linux-6.1.134-152.225.amzn2023.x86_64-x86_64-with-glibc2.34
- Python version: 3.11.12
- Huggingface_hub version: 0.32.4
- Safetensors version: 0.5.3
- Accelerate version: not installed
- Accelerate config: not found
- PyTorch version (GPU?): 2.7.1+cu128 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: YES
- Using distributed or parallel set-up in script?: NO
Thank you for you time, and please let me know what I can do on my end to make it easier to diagnose the issue more precisely.
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
The bug was encountered when trying to reproduce results from this notebook:
To reproduce:
- Follow the notebook as-is, this will install the latest version of transformers
- Continue until the training step and run the training
- Observe unexpectedly high validation edit distance metrics
To fix:
- Pin the transformers version to
4.28.1
- Run the notebook again
- You should observe a much lower validation edit distance metrics
Expected behavior
I expect the training behaviour to be similar on newer versions of transformers
and the performance not to degrade so drastically.