Skip to content

Releases: runpod-workers/worker-vllm

v2.11.1

24 Nov 15:42
6f2381a

Choose a tag to compare

  • chore(deps): update runpod to latest version

v2.11.0

17 Nov 18:38
3851d53

Choose a tag to compare

  • add ENABLE_EXPERT_PARALLEL engine arg for MoE models

New Contributors

v2.10.0

14 Nov 16:25
c896438

Choose a tag to compare

  • feat: cse 853 vllm template params
  • fix(config): update allowed cuda versions in hub and tests config
  • fix: remove space from gpuIds
  • feat: bump transformers to allow Qwen3-VL

New Contributors

v2.9.6

24 Oct 16:49
6337a66

Choose a tag to compare

  • fix: allow also CUDA 12.8 & 12.9

v2.9.5

22 Oct 20:57
66e1b16

Choose a tag to compare

  • chore: update vllm to 0.11.0
  • fix: max concurrency = 30 instead of 300

v2.9.4

24 Sep 16:31

Choose a tag to compare

  • removed HF_TOKEN again

v2.9.3

23 Sep 17:25
33d88df

Choose a tag to compare

  • fix: added back the HF_TOKEN

v2.9.2

19 Sep 19:01

Choose a tag to compare

  • docs: add reasoning parser
  • fix: remove "access token" as this is handled by the platform

v2.9.1

01 Sep 14:48
a0fe1df

Choose a tag to compare

  • feat: better hub support & concise README for the main repo (#215)

v2.9.0

28 Aug 08:10
5f0fc69

Choose a tag to compare

  • feat: prepare worker-vllm for the hub
  • fix: allow "None" as value & parse the value of RAW_OPENAI_OUTPUT correctly