FT with bottleneck :  cannot perform fine-tuning on purely quantized models

Hi! I'm tried to finetune llama-2-13b with bottleneck Adapter, but it got a ValueError that cannot finetune the model loading by using load_in8bit. What is the problem? How can I solve it?

**ValueError: You cannot perform fine-tuning on purely quantized models. Please attach trainable adapters on top of the quantized model to correctly perform fine-tuning. Please see: https://huggingface.co/docs/transformers/peft for more details**

The package versions I'm using are as follows:
accelerate                0.27.2
bitsandbytes              0.41.2.post2
black                     23.11.0
transformers              4.39.0.dev0
torch                     2.1.1
gradio                    4.7.1

The peftModel was constructed as follows. I think it was loaded in 8bit correctly.

---------model structure---------
PeftModelForCausalLM(
  (base_model): BottleneckModel(
    (model): LlamaForCausalLM(
      (model): LlamaModel(
        (embed_tokens): Embedding(32000, 5120)
        (layers): ModuleList(
          (0-39): 40 x LlamaDecoderLayer(
            (self_attn): LlamaSdpaAttention(
              (q_proj): Linear8bitLt(in_features=5120, out_features=5120, bias=False)
              (k_proj): Linear8bitLt(in_features=5120, out_features=5120, bias=False)
              (v_proj): Linear8bitLt(in_features=5120, out_features=5120, bias=False)
              (o_proj): Linear8bitLt(in_features=5120, out_features=5120, bias=False)
              (rotary_emb): LlamaRotaryEmbedding()
            )
            (mlp): LlamaMLP(
              (gate_proj): Linear8bitLt(
                in_features=5120, out_features=5120, bias=False
                (adapter_down): Linear(in_features=5120, out_features=256, bias=False)
                (adapter_up): Linear(in_features=256, out_features=5120, bias=False)
                (act_fn): Tanh()
              )
              (up_proj): Linear8bitLt(
                in_features=5120, out_features=5120, bias=False
                (adapter_down): Linear(in_features=5120, out_features=256, bias=False)
                (adapter_up): Linear(in_features=256, out_features=5120, bias=False)
                (act_fn): Tanh()
              )
              (down_proj): Linear8bitLt(
                in_features=5120, out_features=5120, bias=False
                (adapter_down): Linear(in_features=5120, out_features=256, bias=False)
                (adapter_up): Linear(in_features=256, out_features=5120, bias=False)
                (act_fn): Tanh()
              )
              (act_fn): SiLU()
            )
            (input_layernorm): LlamaRMSNorm()
            (post_attention_layernorm): LlamaRMSNorm()
          )
        )
        (norm): LlamaRMSNorm()
      )
      (lm_head): CastOutputToFloat(
        (0): Linear(in_features=5120, out_features=32000, bias=False)
      )
    )
  )
)





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FT with bottleneck : cannot perform fine-tuning on purely quantized models #57

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

FT with bottleneck : cannot perform fine-tuning on purely quantized models #57

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions