You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As foundation models move towards being trained in eight bits, is there a plan in the roadmap to begin to support this type of approach?
Related to deepseek v3, are there plans to support mixture of expert architectures? I could fully understand if this is too far away from a coherent roadmap.
The text was updated successfully, but these errors were encountered:
As foundation models move towards being trained in eight bits, is there a plan in the roadmap to begin to support this type of approach?
Related to deepseek v3, are there plans to support mixture of expert architectures? I could fully understand if this is too far away from a coherent roadmap.
The text was updated successfully, but these errors were encountered: