Skip to content

[feature-request] Add support for exllamaV2 #13

@shwu-nyunai

Description

@shwu-nyunai

Feature type?
Algorithm request

Proposal
Exllama V2 is one among the prominent1 llm inference libraries out there.
GitHub Repo - https://github.com/turboderp/exllamav2

nyuntam's text-gen compression suite should extend the support for it as an engine.
We'd primarily want to support the quantisation scheme in exllamaV2 along with the supported models ootb ( conversion instructions here ) namely, EXL2;

1Faster and Lighter LLMs: A Survey on Current Challenges and Way Forward

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions