Skip to content

[BUG] Improper coupling of paramter list between DeepSpeedZeroOptimizer_Stage3 and SuperOffloadOptimizer_Stage3 #7627

@delock

Description

@delock

Describe the bug
DeepSpeedZeroOptimizer_Stage3 and SuperOffloadOptimizer_Stage3 shares same parameter list, which would cause divergence easily

** Details **
In

Stage3ZeroOptimizer = DeepSpeedZeroOptimizer_Stage3 if not self.super_offload(
, DeepSpeedZeroOptimizer_Stage3 and SuperOffloadOptimizer_Stage3 initializer shares same parameter list. This caused extra maintence if any one of these parameter list needs to change. There are two observations:

  1. There is already mismatch (i.e. param_names) and this will break SuperOffload.
  2. cpuadam_cores_perc added to DeepSpeedZeroOptimizer_Stage3 as parameter but not used.

** Suggestion **
Seperate calls to DeepSpeedZeroOptimizer_Stage3 and SuperOffloadOptimizer_Stage3.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtraining

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions