-
|
I’m implementing a multi-stage reinforcement learning (RL) pipeline for reasoning tasks using GRPO, and I’d like to:
Setup:
In standard Hugging Face + PEFT workflows, I can load a pre-trained LoRA adapter like this: from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM
config = PeftConfig.from_pretrained("path_to_trained_lora_adapter")
base_model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path)
lora_model = PeftModel.from_pretrained(base_model, "path_to_trained_lora_adapter")However, in my current GRPO trainer config, LoRA is initialized from scratch via these parameters: actor_rollout_ref.model.lora_rank=32 \
actor_rollout_ref.model.lora_alpha=32 \
actor_rollout_ref.model.target_modules=all-linear \There doesn’t appear to be a config option (e.g., Question: |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
|
Now verl can't load and continue training from a pre-trained LoRA adapter, after the pr (#3523) it can work. |
Beta Was this translation helpful? Give feedback.
Now verl can't load and continue training from a pre-trained LoRA adapter, after the pr (#3523) it can work.