Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability to benchmark multiple models concurrently #850

Merged
merged 2 commits into from
Oct 23, 2024

Conversation

liu-cong
Copy link
Contributor

This is useful for benchmarking multiple LoRA adapters. I used this to benchmark the gateway:

  • Also fix the latency_throughput_curve.sh to parse non-integer request rate properly.
  • Also added "errors" to the benchmark results.

@achandrasekar
Copy link
Collaborator

Thanks for sending this out @liu-cong! Change looks good overall and is very useful.

@Bslabe123 if you can take a deeper look and make sure the existing cases with jetstream / vllm continue to work as expected that would be great.

@liu-cong
Copy link
Contributor Author

/hold I am testing the terraform changes

@liu-cong
Copy link
Contributor Author

/hold I am testing the terraform changes

I also tested terraform, looks good to me

This is useful for benchmarking multiple LoRA adapters.
- Also fix the latency_throughput_curve.sh to parse non-integer request
  rate properly.
- Also added "errors" to the benchmark results.
@achandrasekar achandrasekar merged commit a3401f2 into GoogleCloudPlatform:main Oct 23, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants