Skip to content

fidellr/vllm-infer-ops

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

vllm serving engine with MLOps experiments

MLOps Tasks & Experiments:

  1. Inference Engine - Async/Sync Generator 👨🏻‍💻
  2. Embeddings Engine - Sparse, Dense, etc.. 👨🏻‍💻
  3. Retriever Engine - Hybrid, etc..👨🏻‍💻
  4. HyperParameter fine-tuner [axolotl, unsloth, etc..]
  5. Evaluations (seft, peft, etc..)[TODO]

MLOps Providers:

  1. BentoML 👨🏻‍💻
  2. Ray 👨🏻‍💻
  3. ...

Quantization techniques:

  1. GPTQ 👨🏻‍💻
  2. AWQ 👨🏻‍💻
  3. GGUF (locally, smaller devices, etc..) []

@TODO ...

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published