-
Notifications
You must be signed in to change notification settings - Fork 12
Open
Labels
Description
Feature type?
Algorithm request
A proposal draft (if any)
MLC-LLM is an LLM Deployment Engine with ML Compilation.
https://github.com/mlc-ai/mlc-llm
It has a very wide environment and backend support
Primarily,
-
the different quantisation schemes should be supported ootb.
- In the past, we've had tested out that the outputs from nyuntam's w4a16 quant algo (awq) can be directly used as inputs for the
q4f16_awq
quantisation scheme of mlcllm. We expect that this should still hold true. Ideally, if someone chosesq4f16_awq
as the quantisation, nyuntam's AutoAWQ should be used as the intermediary job, and the output(s) be used to continue mlc-llm's weight conversion and model compilation. - for 3bit quantisation, mlc-llm supports Omniquant's inputs as per this notebook
- In the past, we've had tested out that the outputs from nyuntam's w4a16 quant algo (awq) can be directly used as inputs for the
-
all the platforms supported by mlc-llm should supported ootb, though testing for the same is subject to a test environment availability.