Previous PRs introduced a bug on Accumulated Gradients Losses

### System Info

- `transformers` version: 4.54.1
- Platform: Linux-5.15.0-131-generic-x86_64-with-glibc2.39
- Python version: 3.12.3
- Huggingface_hub version: 0.34.3
- Safetensors version: 0.5.3
- Accelerate version: 1.10.0
- Accelerate config:    not found
- DeepSpeed version: 0.17.4
- PyTorch version (accelerator?): 2.8.0a0+5228986c39.nv25.06 (CUDA)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: <fill in>
- Using GPU in script?: <fill in>
- GPU type: NVIDIA H100 80GB HBM3

### Who can help?

Previous PRs from: https://github.com/huggingface/transformers/pull/35207 and https://github.com/huggingface/transformers/pull/34511

It makes the backward() called after rescaling. This creates a double rescaling both here and in Accelerate:

https://github.com/huggingface/accelerate/blob/23cf4ef8a3b58f016f63eeb158b4aa2c3e79fe6f/src/accelerate/accelerator.py#L2724

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

When:

* gradient_accumulation_steps > 1 
* not using deepspeed
* num_items_in_batch is None and self.compute_loss_func is None (i.e., when user ignores the GA loss bug)

The final loss is rescaled twice:
```
loss = loss / gradient_accumulation_steps
```

### Expected behavior

It should be rescaled only once.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Previous PRs introduced a bug on Accumulated Gradients Losses #40052

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Previous PRs introduced a bug on Accumulated Gradients Losses #40052

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions