-
Notifications
You must be signed in to change notification settings - Fork 792
Open
Labels
Description
What happened?
I used the code in this instruction to fine-tune llm https://www.kubeflow.org/docs/components/training/user-guides/fine-tuning/
However , i encounterd error : [rank0]: ValueError: Please specify target_modules in peft_config . I tried to delete the lora config but that error still existed.
`
import transformers
from peft import LoraConfig
from kubeflow.training import TrainingClient
from kubeflow.storage_initializer.hugging_face import (
HuggingFaceModelParams,
HuggingFaceTrainerParams,
HuggingFaceDatasetParams,
)
TrainingClient().train(
name="fine-tune-bert",
# BERT model URI and type of Transformer to train it.
storage_config=
{
"size": "5Gi",
"storage_class": "nfs-client",
},
model_provider_parameters=HuggingFaceModelParams(
model_uri="hf://distilbert/distilbert-base-uncased",
transformer_type=transformers.AutoModelForSequenceClassification,
),
# Use 3000 samples from Yelp dataset.
dataset_provider_parameters=HuggingFaceDatasetParams(
#repo_id="yelp_review_full",
repo_id="yelp_review_full",
split="train[:100]",
),
# Specify HuggingFace Trainer parameters. In this example, we will skip evaluation and model checkpoints.
trainer_parameters=HuggingFaceTrainerParams(
training_parameters=transformers.TrainingArguments(
output_dir="test_trainer",
save_strategy="no",
evaluation_strategy="no",
do_eval=False,
disable_tqdm=True,
log_level="info",
#ddp_backend="gloo",
),
# Set LoRA config to reduce number of trainable model parameters.
#lora_config=LoraConfig(
#r=8,
#lora_alpha=8,
#lora_dropout=0.1,
#bias="none",
#target_modules=["encoder.layer.*.attention.self.query", "encoder.layer.*.attention.self.key"]
#),
),
num_workers=2, # nnodes parameter for torchrun command.
num_procs_per_worker=20, # nproc-per-node parameter for torchrun command.
resources_per_worker={
"cpu": 20,
"memory": "20G",
},
)
`
What did you expect to happen?
fine-tuning process passes successfully
Environment
Kubernetes version:
$ kubectl version
Client Version: v1.29.6+k3s2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.6+k3s2
Training Operator version:
$ kubectl get pods -n kubeflow -l control-plane=kubeflow-training-operator -o jsonpath="{.items[*].spec.containers[*].image}"
kubeflow/training-operator:latest
Training Operator Python SDK version:
$ pip show kubeflow-training
Name: kubeflow-training
Version: 1.8.1
Summary: Training Operator Python SDK
Home-page: https://github.com/kubeflow/training-operator/tree/master/sdk/python
Author: Kubeflow Authors
Author-email: [email protected]
License: Apache License Version 2.0
Location: /opt/conda/lib/python3.11/site-packages
Requires: certifi, kubernetes, retrying, setuptools, six, urllib3
Required-by:
Impacted by this bug?
👍
Electronic-Waste