Skip to content

[feature-request] Add support for mlc-llm #18

@shwu-nyunai

Description

@shwu-nyunai

Feature type?
Algorithm request

A proposal draft (if any)
MLC-LLM is an LLM Deployment Engine with ML Compilation.
https://github.com/mlc-ai/mlc-llm

It has a very wide environment and backend support
image

Primarily,

  • the different quantisation schemes should be supported ootb.

    • In the past, we've had tested out that the outputs from nyuntam's w4a16 quant algo (awq) can be directly used as inputs for the q4f16_awq quantisation scheme of mlcllm. We expect that this should still hold true. Ideally, if someone choses q4f16_awq as the quantisation, nyuntam's AutoAWQ should be used as the intermediary job, and the output(s) be used to continue mlc-llm's weight conversion and model compilation.
    • for 3bit quantisation, mlc-llm supports Omniquant's inputs as per this notebook
  • all the platforms supported by mlc-llm should supported ootb, though testing for the same is subject to a test environment availability.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions