Description
Hi! I'm tried to finetune llama-2-13b with bottleneck Adapter, but it got a ValueError that cannot finetune the model loading by using load_in8bit. What is the problem? How can I solve it?
ValueError: You cannot perform fine-tuning on purely quantized models. Please attach trainable adapters on top of the quantized model to correctly perform fine-tuning. Please see: https://huggingface.co/docs/transformers/peft for more details
The package versions I'm using are as follows:
accelerate 0.27.2
bitsandbytes 0.41.2.post2
black 23.11.0
transformers 4.39.0.dev0
torch 2.1.1
gradio 4.7.1
The peftModel was constructed as follows. I think it was loaded in 8bit correctly.
---------model structure---------
PeftModelForCausalLM(
(base_model): BottleneckModel(
(model): LlamaForCausalLM(
(model): LlamaModel(
(embed_tokens): Embedding(32000, 5120)
(layers): ModuleList(
(0-39): 40 x LlamaDecoderLayer(
(self_attn): LlamaSdpaAttention(
(q_proj): Linear8bitLt(in_features=5120, out_features=5120, bias=False)
(k_proj): Linear8bitLt(in_features=5120, out_features=5120, bias=False)
(v_proj): Linear8bitLt(in_features=5120, out_features=5120, bias=False)
(o_proj): Linear8bitLt(in_features=5120, out_features=5120, bias=False)
(rotary_emb): LlamaRotaryEmbedding()
)
(mlp): LlamaMLP(
(gate_proj): Linear8bitLt(
in_features=5120, out_features=5120, bias=False
(adapter_down): Linear(in_features=5120, out_features=256, bias=False)
(adapter_up): Linear(in_features=256, out_features=5120, bias=False)
(act_fn): Tanh()
)
(up_proj): Linear8bitLt(
in_features=5120, out_features=5120, bias=False
(adapter_down): Linear(in_features=5120, out_features=256, bias=False)
(adapter_up): Linear(in_features=256, out_features=5120, bias=False)
(act_fn): Tanh()
)
(down_proj): Linear8bitLt(
in_features=5120, out_features=5120, bias=False
(adapter_down): Linear(in_features=5120, out_features=256, bias=False)
(adapter_up): Linear(in_features=256, out_features=5120, bias=False)
(act_fn): Tanh()
)
(act_fn): SiLU()
)
(input_layernorm): LlamaRMSNorm()
(post_attention_layernorm): LlamaRMSNorm()
)
)
(norm): LlamaRMSNorm()
)
(lm_head): CastOutputToFloat(
(0): Linear(in_features=5120, out_features=32000, bias=False)
)
)
)
)