Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix runtime errors reported when using long input sequence lengths with LoRA #343

Merged
merged 1 commit into from
Sep 27, 2024

Conversation

vivekgoe
Copy link

This PR has following fixes,

  • Increase size of indices tensors used to maintain multi-lora state information from max_num_batched_tokens to 3*max_num_batched_tokens. This increase is done to provide buffer for padding done in batch & sequence dimensions.

  • Move logic to remove padding from lora_logits from execute_model() back to Class LogitsProcessorWithLoRA, this is done to fix race condition caused by updating multi-lora state information directly.

FIX #237

@michalkuligowski michalkuligowski merged commit b70dcba into v1.18.0 Sep 27, 2024
2 checks passed
@michalkuligowski michalkuligowski deleted the private/vgoel/v1.18.0_lora_long_seq_fix branch September 27, 2024 07:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants