Releases · runpod-workers/worker-vllm · GitHub

26 Jan 04:26

alpayariyak

0.2.0

Worker vLLM 0.2.0 - What's New

You no longer need a linux-based machine or NVIDIA GPUs to build the worker.
Over 3x lighter Docker image size.
OpenAI Chat Completion output format (optional to use).
Fast image build time.
Docker Secrets-protected Hugging Face token support for building the image with a model baked in without exposing your token.
Support for n and best_of sampling parameters, which allow you to generate multiple responses from a single prompt.
New environment variables for various configuration.
vLLM Version: 0.2.7

Assets 2

17 Jan 00:51

0.1.0

What's Changed

Fixed STREAMING environment variable not being interpreted as boolean. by @vladmihaisima in #4
10x Faster New Worker by @alpayariyak in #18
Update runpod package version by @github-actions in #19
fix: update badge by @justinmerrell in #20
Chat Template Feature, Message List, Small Refactor by @alpayariyak in #27

New Contributors

@vladmihaisima made their first contribution in #4
@alpayariyak made their first contribution in #18
@github-actions made their first contribution in #19
@justinmerrell made their first contribution in #20

Full Changelog: https://github.com/runpod-workers/worker-vllm/commits/0.1.0

Contributors

vladmihaisima, justinmerrell, and alpayariyak

Assets 2