-
Notifications
You must be signed in to change notification settings - Fork 12
Open
Labels
Description
Feature type?
Algorithm request
Proposal
Exllama V2 is one among the prominent1 llm inference libraries out there.
GitHub Repo - https://github.com/turboderp/exllamav2
nyuntam's text-gen compression suite should extend the support for it as an engine.
We'd primarily want to support the quantisation scheme in exllamaV2 along with the supported models ootb ( conversion instructions here ) namely, EXL2;
1Faster and Lighter LLMs: A Survey on Current Challenges and Way Forward