Skip to content

Performance discrepancy between Accelerate and plain Python runs #122

Open
@Lemma1727

Description

@Lemma1727

Hello open-unlearning team,

I just ran into something odd: the exact same fine-tuning script gives me very different results depending on whether I launch it with Accelerate or call python directly. I’d like to know if anyone has run into a similar situation or can spot what I might be missing.

  • Script : scripts/tofu_finetune.sh (left at its default retain-only settings)
  • GPU : 2 × A100 (80 GB)
launcher GPUs wall-time (s) samples/s steps/s final train loss extraction_strength forget_Q_A_prob forget_Q_A_ROUGE model_utility privleak
Accelerate 0,1 938.6 19.18 0.602 0.936 0.0593 0.1148 0.3815 0.5901 23.69
Plain Python 0 817.2 22.03 1.377 1.554 0.0621 0.2092 0.3978 0.4442 28.38
Accelerate (single GPU) 0 4646.7 3.87 0.242 0.822 0.0611 0.1063 0.3796 0.5938 23.15

The plain Python run somehow lands on worse results compared to the Accelerate version.
Moreover, Using Accelerate on a single GPU matches the good loss but takes five times longer than the plain single GPU run, which is also weird.

Have you seen gradients/optimizer behave differently just from switching launchers?
Any tips on flags or configs I should double-check (e.g., gradient accumulation, FP16/bf16, broadcast_buffers, etc.) would be super helpful.

Thanks in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions