Add the ability to benchmark multiple models concurrently #850

liu-cong · 2024-10-15T03:07:47Z

This is useful for benchmarking multiple LoRA adapters. I used this to benchmark the gateway:

Also fix the latency_throughput_curve.sh to parse non-integer request rate properly.
Also added "errors" to the benchmark results.

benchmarks/benchmark/tools/profile-generator/container/latency_throughput_curve.sh

benchmarks/benchmark/tools/profile-generator/container/benchmark_serving.py

achandrasekar · 2024-10-15T16:59:50Z

Thanks for sending this out @liu-cong! Change looks good overall and is very useful.

@Bslabe123 if you can take a deeper look and make sure the existing cases with jetstream / vllm continue to work as expected that would be great.

liu-cong · 2024-10-15T17:25:08Z

/hold I am testing the terraform changes

liu-cong · 2024-10-15T18:03:01Z

/hold I am testing the terraform changes

I also tested terraform, looks good to me

This is useful for benchmarking multiple LoRA adapters. - Also fix the latency_throughput_curve.sh to parse non-integer request rate properly. - Also added "errors" to the benchmark results.

liu-cong requested review from achandrasekar, ahg-g and annapendleton as code owners October 15, 2024 03:07

liu-cong force-pushed the multi-model branch from e5ac29f to 1e5fa68 Compare October 15, 2024 16:31

liu-cong commented Oct 15, 2024

View reviewed changes

benchmarks/benchmark/tools/profile-generator/container/latency_throughput_curve.sh Outdated Show resolved Hide resolved

achandrasekar requested a review from Bslabe123 October 15, 2024 16:39

achandrasekar reviewed Oct 15, 2024

View reviewed changes

benchmarks/benchmark/tools/profile-generator/container/latency_throughput_curve.sh Outdated Show resolved Hide resolved

benchmarks/benchmark/tools/profile-generator/container/benchmark_serving.py Show resolved Hide resolved

liu-cong force-pushed the multi-model branch from 1e5fa68 to ee4802a Compare October 15, 2024 18:01

Add the ability to benchmark multiple models concurrently.

b0be375

This is useful for benchmarking multiple LoRA adapters. - Also fix the latency_throughput_curve.sh to parse non-integer request rate properly. - Also added "errors" to the benchmark results.

liu-cong force-pushed the multi-model branch from ee4802a to b0be375 Compare October 17, 2024 22:08

Re-sample requests for each model

69b139f

liu-cong force-pushed the multi-model branch from c761c94 to 69b139f Compare October 18, 2024 16:46

Bslabe123 approved these changes Oct 22, 2024

View reviewed changes

achandrasekar approved these changes Oct 22, 2024

View reviewed changes

achandrasekar merged commit a3401f2 into GoogleCloudPlatform:main Oct 23, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the ability to benchmark multiple models concurrently #850

Add the ability to benchmark multiple models concurrently #850

liu-cong commented Oct 15, 2024

achandrasekar commented Oct 15, 2024

liu-cong commented Oct 15, 2024

liu-cong commented Oct 15, 2024

Add the ability to benchmark multiple models concurrently #850

Add the ability to benchmark multiple models concurrently #850

Conversation

liu-cong commented Oct 15, 2024

achandrasekar commented Oct 15, 2024

liu-cong commented Oct 15, 2024

liu-cong commented Oct 15, 2024