Storage in fp8

Related to https://github.com/NVIDIA/TransformerEngine/issues/1261 and https://github.com/NVIDIA/TransformerEngine/issues/1764 but it is not entire clear there:

TransformerEngine could support storage in fp8 and could be dropping storage of weights in native precision after initialization. This might seem counterintuitive in a training environment, but please consider LoRA and other adapter trainings. Most of your weights you never need at their original precision again - you just want to use TransformerEngine for its efficient calculations.

The LoRA weights you keep at a higher precision.

The vram usage of TransformerEngine is currently prohibitive for training a small adapter to a large transformer.

**Describe alternatives you've considered**

Continue to use a custom Linear layer that stores in fp8, but doesn't have the efficient calculations performed by TransformerEngine


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Storage in fp8 #1880

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Storage in fp8 #1880

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions