Bridge to `vllm` #26

toilaluan · 2024-12-07T07:31:14Z

Feature

We're lack of connector to popular inference engine like vLLM. Do the team have purpose to work/collaborate with vllm to support this?

SimJeg · 2024-12-09T15:43:52Z

Hi @toilaluan,

We don't plan support to vLLM immediately. Our goal is to offer a platform for implementing and comparing different KV cache compression methods, with no objective for production so far.

Could you elaborate on what is missing in kvpress ? We are currently doing a refacto of the code (see #21) to offer more modularity, and the main missing feature we'll work on next is the possibility to implement head-wise cache compression (see #7 and the excellent preliminary work already done in #25).

toilaluan added the feature request New feature or request label Dec 7, 2024

SimJeg self-assigned this Dec 9, 2024

SimJeg added the not planned This will not be worked on label Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bridge to `vllm` #26

Bridge to `vllm` #26

toilaluan commented Dec 7, 2024

SimJeg commented Dec 9, 2024

Bridge to vllm #26

Bridge to vllm #26

Comments

toilaluan commented Dec 7, 2024

Feature

SimJeg commented Dec 9, 2024

Bridge to `vllm` #26

Bridge to `vllm` #26