KEP-2170: Design Trainer for the LLM Runtimes #2321

andreyvelich · 2024-11-05T21:30:50Z

As part of Kubeflow Training V2 work, we should design and implement custom Trainer to fine-tune LLMs that we are planning to support via TrainingRuntimes in Kubeflow upstream.

We should discuss whether we should use native PyTorch APIs or HuggingFace Transformers in the LLM Trainer implementation.

The Trainer should allow users to configure LoRA, QLoRA, FSDP, and other important configurations.

Useful resources:

LLM Trainer implementation in the Kubeflow Training V1
Recipes to fine-tune Llama models

Part of: #2170

cc @saileshd1402 @deepanker13 @kubeflow/wg-training-leads

Love this feature?

Give it a 👍 We prioritize the features with most 👍

andreyvelich added the kind/feature label Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KEP-2170: Design Trainer for the LLM Runtimes #2321

KEP-2170: Design Trainer for the LLM Runtimes #2321

andreyvelich commented Nov 5, 2024 •

edited

Loading

KEP-2170: Design Trainer for the LLM Runtimes #2321

KEP-2170: Design Trainer for the LLM Runtimes #2321

Comments

andreyvelich commented Nov 5, 2024 • edited Loading

Love this feature?

andreyvelich commented Nov 5, 2024 •

edited

Loading