Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bridge to vllm #26

Open
toilaluan opened this issue Dec 7, 2024 · 1 comment
Open

Bridge to vllm #26

toilaluan opened this issue Dec 7, 2024 · 1 comment
Assignees
Labels
feature request New feature or request not planned This will not be worked on

Comments

@toilaluan
Copy link

Feature

We're lack of connector to popular inference engine like vLLM. Do the team have purpose to work/collaborate with vllm to support this?

@toilaluan toilaluan added the feature request New feature or request label Dec 7, 2024
@SimJeg SimJeg self-assigned this Dec 9, 2024
@SimJeg
Copy link
Collaborator

SimJeg commented Dec 9, 2024

Hi @toilaluan,

We don't plan support to vLLM immediately. Our goal is to offer a platform for implementing and comparing different KV cache compression methods, with no objective for production so far.

Could you elaborate on what is missing in kvpress ? We are currently doing a refacto of the code (see #21) to offer more modularity, and the main missing feature we'll work on next is the possibility to implement head-wise cache compression (see #7 and the excellent preliminary work already done in #25).

@SimJeg SimJeg added the not planned This will not be worked on label Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request not planned This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants