Skip to content

Releases: runpod-workers/worker-vllm

0.2.0

26 Jan 04:26

Choose a tag to compare

Worker vLLM 0.2.0 - What's New

  • You no longer need a linux-based machine or NVIDIA GPUs to build the worker.
  • Over 3x lighter Docker image size.
  • OpenAI Chat Completion output format (optional to use).
  • Fast image build time.
  • Docker Secrets-protected Hugging Face token support for building the image with a model baked in without exposing your token.
  • Support for n and best_of sampling parameters, which allow you to generate multiple responses from a single prompt.
  • New environment variables for various configuration.
  • vLLM Version: 0.2.7

0.1.0

17 Jan 00:51
ed48093

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: https://github.com/runpod-workers/worker-vllm/commits/0.1.0