You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We don't plan support to vLLM immediately. Our goal is to offer a platform for implementing and comparing different KV cache compression methods, with no objective for production so far.
Could you elaborate on what is missing in kvpress ? We are currently doing a refacto of the code (see #21) to offer more modularity, and the main missing feature we'll work on next is the possibility to implement head-wise cache compression (see #7 and the excellent preliminary work already done in #25).
Feature
We're lack of connector to popular inference engine like vLLM. Do the team have purpose to work/collaborate with vllm to support this?
The text was updated successfully, but these errors were encountered: