Plans for fp8 tuning going forward? Eg Deepseek v3 #2216

RonanKMcGovern · 2024-12-30T23:19:29Z

As foundation models move towards being trained in eight bits, is there a plan in the roadmap to begin to support this type of approach?

Related to deepseek v3, are there plans to support mixture of expert architectures? I could fully understand if this is too far away from a coherent roadmap.

RdoubleA · 2024-12-31T02:04:25Z

On MoE, this is something we're actively working on, see #1902. Hoping to share updates on this very soon :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plans for fp8 tuning going forward? Eg Deepseek v3 #2216

Plans for fp8 tuning going forward? Eg Deepseek v3 #2216

RonanKMcGovern commented Dec 30, 2024

RdoubleA commented Dec 31, 2024

Plans for fp8 tuning going forward? Eg Deepseek v3 #2216

Plans for fp8 tuning going forward? Eg Deepseek v3 #2216

Comments

RonanKMcGovern commented Dec 30, 2024

RdoubleA commented Dec 31, 2024