Skip to content

Storage in fp8 #1880

Open
Open
@dxqbYD

Description

@dxqbYD

Related to #1261 and #1764 but it is not entire clear there:

TransformerEngine could support storage in fp8 and could be dropping storage of weights in native precision after initialization. This might seem counterintuitive in a training environment, but please consider LoRA and other adapter trainings. Most of your weights you never need at their original precision again - you just want to use TransformerEngine for its efficient calculations.

The LoRA weights you keep at a higher precision.

The vram usage of TransformerEngine is currently prohibitive for training a small adapter to a large transformer.

Describe alternatives you've considered

Continue to use a custom Linear layer that stores in fp8, but doesn't have the efficient calculations performed by TransformerEngine

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions