Skip to content

Releases: runpod-workers/worker-vllm

v1.6.0

16 Oct 00:37
ce47c41
Compare
Choose a tag to compare
Merge pull request #125 from runpod-workers/up-0.6.3

update vllm

v1.5.0

01 Oct 18:23
d3ee323
Compare
Choose a tag to compare
  • vllm version update 0.6.1 --> 0.6.2.
  • Supports llama 3.2 Models.

v1.4.0: Merge pull request #109 from runpod-workers/0.5.5-update

17 Sep 06:22
b1554ea
Compare
Choose a tag to compare

v1.3.1

06 Sep 19:42
b1554ea
Compare
Choose a tag to compare

vLLm version: 0.5.5

  • OpenAI Completion Requests Bug fix.

v1.3.0

29 Aug 06:34
286d6ba
Compare
Choose a tag to compare

Version upgrade from vllm v0.5.4 -> v0.5.5

Various improvements and bug fixes.
[Known Issue]: OpenAI Completion Requests error.

v1.2.0

09 Aug 21:59
eb75a3a
Compare
Choose a tag to compare

Version upgrade from vllm v0.5.3 -> v0.5.4

  • Various improvements and bug fixes.
  • [Known Issue]: OpenAI Completion Requests error.

v1.1.0

02 Aug 23:54
37d140a
Compare
Choose a tag to compare
  • Major update from vllm v0.4.2 -> v0.5.3.
  • supports Llama 3.1 version models.
  • Various improvements and bug fixes.
    [Known Issue]: OpenAI Completion Requests error.

1.0.1

13 Jun 17:52
Compare
Choose a tag to compare

Hotfix adding backwards compatibility for deprecated max_context_len_to_capture engine argument

1.0.0

12 Jun 19:29
Compare
Choose a tag to compare

Worker vLLM 1.0.0 - What's Changed

  • vLLM version 0.3.3 -> 0.4.2
  • Various improvements and bug fixes

0.3.2

12 Mar 23:11
cee4e48
Compare
Choose a tag to compare

Worker vLLM 0.3.2 - What's Changed

  • vLLM version 0.3.2 -> 0.3.3
    • StarCoder2 support
    • Performance optimization for Gemma
    • 2/3/8-bit GPTQ support
    • Integrate Marlin Kernels for Int4 GPTQ inference
    • Performance optimization for MoE kernel
  • Updated and refactored base image, sampling parameters, etc.
  • Various bug fixes