Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump kv cache min memory for batch jobs #536

Merged
merged 3 commits into from
Jun 10, 2024
Merged

Conversation

dmchoiboi
Copy link
Collaborator

@dmchoiboi dmchoiboi commented Jun 8, 2024

Pull Request Summary

What is this PR changing? Why is this change being made? Any caveats you'd like to highlight? Link any relevant documents, links, or screenshots here if applicable.

Update hardware inference logic to increase min kv cache allocated for batch jobs. This will improve batch completion throughput for larger models (e.g. llama-3-70b)

Test Plan and Usage Guide

How did you validate that your PR works correctly? How do you run or demo the code? Provide enough detail so a reviewer can reasonably reproduce the testing procedure. Paste example command line invocations if applicable.

@dmchoiboi dmchoiboi requested a review from yunfeng-scale June 8, 2024 01:09
@dmchoiboi dmchoiboi force-pushed the dmchoi/bump-kvcache-minw branch from d63b1d3 to a724c72 Compare June 10, 2024 16:59
@dmchoiboi dmchoiboi force-pushed the dmchoi/bump-kvcache-minw branch from a724c72 to faeffb6 Compare June 10, 2024 17:00
@dmchoiboi dmchoiboi merged commit 6447c5f into main Jun 10, 2024
5 checks passed
@dmchoiboi dmchoiboi deleted the dmchoi/bump-kvcache-minw branch June 10, 2024 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants