Bump kv cache min memory for batch jobs #536

dmchoiboi · 2024-06-08T01:09:53Z

Pull Request Summary

What is this PR changing? Why is this change being made? Any caveats you'd like to highlight? Link any relevant documents, links, or screenshots here if applicable.

Update hardware inference logic to increase min kv cache allocated for batch jobs. This will improve batch completion throughput for larger models (e.g. llama-3-70b)

Test Plan and Usage Guide

How did you validate that your PR works correctly? How do you run or demo the code? Provide enough detail so a reviewer can reasonably reproduce the testing procedure. Paste example command line invocations if applicable.

model-engine/model_engine_server/domain/use_cases/llm_model_endpoint_use_cases.py

bump kv cache min for batch jobs

6b8e45f

dmchoiboi requested a review from yunfeng-scale June 8, 2024 01:09

yunfeng-scale reviewed Jun 8, 2024

View reviewed changes

model-engine/model_engine_server/domain/use_cases/llm_model_endpoint_use_cases.py Outdated Show resolved Hide resolved

Add test for batch job

8b7bec7

dmchoiboi force-pushed the dmchoi/bump-kvcache-minw branch from d63b1d3 to a724c72 Compare June 10, 2024 16:59

Bump multiplier to 18 to get batch job to use 4 GPU

faeffb6

dmchoiboi force-pushed the dmchoi/bump-kvcache-minw branch from a724c72 to faeffb6 Compare June 10, 2024 17:00

yunfeng-scale approved these changes Jun 10, 2024

View reviewed changes

dmchoiboi merged commit 6447c5f into main Jun 10, 2024
5 checks passed

dmchoiboi deleted the dmchoi/bump-kvcache-minw branch June 10, 2024 17:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump kv cache min memory for batch jobs #536

Bump kv cache min memory for batch jobs #536

dmchoiboi commented Jun 8, 2024 •

edited

Loading

Bump kv cache min memory for batch jobs #536

Bump kv cache min memory for batch jobs #536

Conversation

dmchoiboi commented Jun 8, 2024 • edited Loading

Pull Request Summary

Test Plan and Usage Guide

dmchoiboi commented Jun 8, 2024 •

edited

Loading