Skip to content

Add HQQ to weight compression algorithms for LLMs #3347

Open
@hello-fri-end

Description

@hello-fri-end

🚀 Feature request

HQQ is a popular data-free weight quantization algorithm for LLMs. It would be super cool to add it NNCF's weight compression algorithms. I would like to work on this myself. I understand I need to create my hqq.py file inside nncf/quantization/algorithms/weight_compression dir & I'm currently diving into the implementations of awq and gptq. Currently, I'm having trouble understanding the NNCFGraph object which needs to be passed to the apply method. Are there some docs on how to understand this Graph object? It would also be super helpful if you guys can point me to some code/docs that I can look into to understand the workflow better. Looking forward to contributing 🚀

Feature Use Case

HQQ is a fast and accurate model quantizer that skips the need for calibration data. It offers compression quality competitive with that of calibration-based methods. For instance, HQQ takes less than 5 minutes to process the colossal Llama-2-70B, that’s over 50x faster compared to the widely adopted GPTQ

Are you going to submit a PR?

  • Yes I'd like to help by submitting a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions