Replies: 2 comments 6 replies
-
Good idea! Btw, shouldn't we implement a LoRA extractor in |
Beta Was this translation helpful? Give feedback.
6 replies
-
Small update on this, I've been able to convert the diff between 2 models into a LoRA adapter: https://huggingface.co/ngxson/LoRA-Qwen2.5-Coder-7B-Instruct I haven't tested with infill, will try in a few days. But in the mean time, we also need #11131 to be merged, so lora for token embeddings will be supported |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
The author of Qwen model confirm that infill capability is only possible with Qwen-coder (non-Instruct): https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct/discussions/2#6731a45e0e39be0605a0df20
This will limit the capability of the model to
/infill
only, so it cannot be used with/chat/completions
However, we know that the instruct version is indeed fine-tuned from non-instruct, see the technical report: https://arxiv.org/pdf/2409.12186
To make the model usable with both chat and infill, one solution is to extract the difference between 2 models to a LoRA adapter. This can be done via something like
mergekit-extract-lora
, then we can set lora scale at runtime (i.e. set to 0.0 on infill and 1.0 on chat)Beta Was this translation helpful? Give feedback.
All reactions