Description
Hello open-unlearning team,
I just ran into something odd: the exact same fine-tuning script gives me very different results depending on whether I launch it with Accelerate or call python directly. I’d like to know if anyone has run into a similar situation or can spot what I might be missing.
- Script : scripts/tofu_finetune.sh (left at its default retain-only settings)
- GPU : 2 × A100 (80 GB)
launcher | GPUs | wall-time (s) | samples/s | steps/s | final train loss | extraction_strength | forget_Q_A_prob | forget_Q_A_ROUGE | model_utility | privleak |
---|---|---|---|---|---|---|---|---|---|---|
Accelerate | 0,1 | 938.6 | 19.18 | 0.602 | 0.936 | 0.0593 | 0.1148 | 0.3815 | 0.5901 | 23.69 |
Plain Python | 0 | 817.2 | 22.03 | 1.377 | 1.554 | 0.0621 | 0.2092 | 0.3978 | 0.4442 | 28.38 |
Accelerate (single GPU) | 0 | 4646.7 | 3.87 | 0.242 | 0.822 | 0.0611 | 0.1063 | 0.3796 | 0.5938 | 23.15 |
The plain Python run somehow lands on worse results compared to the Accelerate version.
Moreover, Using Accelerate on a single GPU matches the good loss but takes five times longer than the plain single GPU run, which is also weird.
Have you seen gradients/optimizer behave differently just from switching launchers?
Any tips on flags or configs I should double-check (e.g., gradient accumulation, FP16/bf16, broadcast_buffers, etc.) would be super helpful.
Thanks in advance!