Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to add a new model for OmniQuant? #22

Closed
gesanqiu opened this issue Oct 10, 2023 · 5 comments
Closed

How to add a new model for OmniQuant? #22

gesanqiu opened this issue Oct 10, 2023 · 5 comments

Comments

@gesanqiu
Copy link

gesanqiu commented Oct 10, 2023

Thanks for your brilliant work, after explord the project for several days, I found that OmniQuant is portable for edge devices, like Jetson or phones. And wondering how can I add more models into OmniQuant, do you have any tutorials about this? Or maybe we can start from CodeLlama, since it has the similiar architecture with Llama-2, and Llama-2 is already supported.
Also apologies in advance if this seems to be something obvious because I'm new in LLM field.

@ChenMnZ
Copy link
Collaborator

ChenMnZ commented Oct 11, 2023

If you want to quantize a new model with the same architecture with supported model, you can just set the --net directly:
This is an example command to quantize CodeLLama:

CUDA_VISIBLE_DEVICES=0 python main.py \
--model /PATH/TO/CodeLLama/CodeLLama-7b \
--epochs 20 --output_dir ./log/llama-7b-w3a16g128 \
--eval_ppl --wbits 3 --abits 16 --group_size 128 --lwc \
--net Llama-2-7b

@gesanqiu
Copy link
Author

gesanqiu commented Oct 11, 2023

Thanks for you reply, my question is how to add a new model which architecture is not supported yet.

@superdocker
Copy link

superdocker commented Oct 20, 2023

You have to add new files int_{your model}_layer.py in models/, refer to other files. There can be some modification about namings for register parameter(e.g. o_proj <-> out_proj, c_fc <-> fc2) and cpu offloading (e.g. model.transformer.h <-> model.decoder.layers) in main.py and omniquant.py.

@Louym
Copy link

Louym commented Nov 3, 2023

You have to add new files int_{your model}_layer.py in models/, refer to other files. There can be some modification about namings for register parameter(e.g. o_proj <-> out_proj, c_fc <-> fc2) and cpu offloading (e.g. model.transformer.h <-> model.decoder.layers) in main.py and omniquant.py.

Have you tried to add bloom models? I met some problems in issue #29 .

@superdocker
Copy link

@Louym No, I haven't tried to add bloom models. I don't know the details of your implementation (and I'm not a contributor of this repo), and attached error has various reasons. Anyway, I hope you can solve this problem, and I hope my experience can be of help.

  1. Disable some transformations. For example, LayerNorm-Linear transform in BLOOM is already implement in other repos (e.g. SmoothQuant or AWQ), so if you disable Query-Key or Value-Output transform (unique contribution of this repo), you can find the debug point much easier.
  2. Inference first to check the functionality in higher-precision. Because this repo initialize transform parameters by smoothquant, 8-bit inference (with no omni_param update) result shows almost baseline accuracy if your implementation is right. Otherwise, your implementation is something wrong, independent from optimizer or computational graph, and backward pass.

@ChenMnZ ChenMnZ closed this as completed Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants