Fix runtime errors reported when using long input sequence lengths with LoRA #343

vivekgoe · 2024-09-27T06:52:03Z

This PR has following fixes,

Increase size of indices tensors used to maintain multi-lora state information from max_num_batched_tokens to 3*max_num_batched_tokens. This increase is done to provide buffer for padding done in batch & sequence dimensions.
Move logic to remove padding from lora_logits from execute_model() back to Class LogitsProcessorWithLoRA, this is done to fix race condition caused by updating multi-lora state information directly.

…th LoRA

vllm/worker/habana_model_runner.py

Fix runtime errors reported when using long input sequence lengths wi…

b40d88d

…th LoRA

vivekgoe requested review from kzawora-intel, michalkuligowski and hlahkar September 27, 2024 06:52

michalkuligowski approved these changes Sep 27, 2024

View reviewed changes

michalkuligowski reviewed Sep 27, 2024

View reviewed changes

vllm/worker/habana_model_runner.py Show resolved Hide resolved

hlahkar approved these changes Sep 27, 2024

View reviewed changes

michalkuligowski merged commit b70dcba into v1.18.0 Sep 27, 2024
2 checks passed

michalkuligowski deleted the private/vgoel/v1.18.0_lora_long_seq_fix branch September 27, 2024 07:32

Provide feedback