-
Notifications
You must be signed in to change notification settings - Fork 828
Open
Labels
Milestone
Description
Creating this issue to track Kubeflow Trainer v2.1 release. Currently, we target mid October to prepare the first release candidate.
Milestone for the v2.1 release: https://github.com/kubeflow/trainer/milestone/6
- KEP-2655: Kubeflow Data Cache for distributed training on Kubernetes #2655
- Support for ResourcesPerNode in DeepSpeed Training Job Containers #2650
- Add PodTemplateOverrides API into TrainJob #2784
- Support JAX Runtimes #2442
- [GSoC] Project 10: Support Volcano Scheduler in Kubeflow Trainer #2671
- feat(api): Sync TrainJob JobsStatus from JobSet ReplicatedJobsStatus #2802
- feat(runtimes): Add LoRA/QLoRA/DoRA support in LLM Trainer V2 #2832
Please let me know if I miss any pending large features we should complete.
cc @kubeflow/kubeflow-trainer-team @kubeflow/kubeflow-sdk-team @akshaychitneni @Doris-xm @mahdikhashan @Labreo @zren11 @kannon92 @zren11 @mimowo @rudeigerc @sceneryback @Monokaix
mimowo, astefanutti, Electronic-Waste and stivanov-intercommimowo, astefanutti, Electronic-Waste and stivanov-intercom