Performance discrepancy between Accelerate and plain Python runs

Hello open-unlearning team,

I just ran into something odd: the exact same fine-tuning script gives me very different results depending on whether I launch it with Accelerate or call python directly. I’d like to know if anyone has run into a similar situation or can spot what I might be missing.

- Script : scripts/tofu_finetune.sh (left at its default retain-only settings)
- GPU : 2 × A100 (80 GB)

launcher | GPUs | wall-time (s) | samples/s | steps/s | final train loss | extraction_strength | forget_Q_A_prob | forget_Q_A_ROUGE | model_utility | privleak
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
Accelerate | 0,1 | 938.6 | 19.18 | 0.602 | 0.936 | 0.0593 | 0.1148 | 0.3815 | 0.5901 | 23.69
Plain Python | 0 | 817.2 | 22.03 | 1.377 | 1.554 | 0.0621 | 0.2092 | 0.3978 | 0.4442 | 28.38
Accelerate (single GPU) | 0 | 4646.7 | 3.87 | 0.242 | 0.822 | 0.0611 | 0.1063 | 0.3796 | 0.5938 | 23.15

The plain Python run somehow lands on worse results compared to the Accelerate version.
Moreover, Using Accelerate on a single GPU matches the good loss but takes five times longer than the plain single GPU run, which is also weird.

Have you seen gradients/optimizer behave differently just from switching launchers?
Any tips on flags or configs I should double-check (e.g., gradient accumulation, FP16/bf16, broadcast_buffers, etc.) would be super helpful.

Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance discrepancy between Accelerate and plain Python runs #122

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

launcher	GPUs	wall-time (s)	samples/s	steps/s	final train loss	extraction_strength	forget_Q_A_prob	forget_Q_A_ROUGE	model_utility	privleak
Accelerate	0,1	938.6	19.18	0.602	0.936	0.0593	0.1148	0.3815	0.5901	23.69
Plain Python	0	817.2	22.03	1.377	1.554	0.0621	0.2092	0.3978	0.4442	28.38
Accelerate (single GPU)	0	4646.7	3.87	0.242	0.822	0.0611	0.1063	0.3796	0.5938	23.15

Performance discrepancy between Accelerate and plain Python runs #122

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions