More metrics on LPG script #832

Bslabe123 · 2024-09-23T22:54:56Z

Example LPG json results, a few notes:

TPOT metric is faulty and incorrectly records inf to buckets
Some values are duplicated across various fields. This is intentional. This json is expected to be consumed by various pipelines, each expecting uniformity.

{
  "metrics": {
    "num_prompts": 600,
    "request_rate": 20,
    "server_metrics": {
      "jetstream_slots_used_percentage": {
        "Mean": "0.81875",
        "Median": "1",
        "Std": "0.3625",
        "Min": "0.09375",
        "Max": "1",
        "P90": "1",
        "P99": "1"
      },
      "jetstream_prefill_backlog_size": {
        "Mean": "66.75",
        "Median": "75",
        "Std": "44.71786555729153",
        "Min": "0",
        "Max": "117",
        "P90": "110.7",
        "P99": "116.36999999999999"
      },
      "jetstream_time_to_first_token": {
        "Mean": "0.129368504891402",
        "Median": "0.1624223602484472",
        "Min": "0.01",
        "Max": "0.25",
        "P90": "0.23248447204968944",
        "P99": "0.24824844720496894"
      },
      "jetstream_time_per_output_token": {
        "Mean": "0",
        "Median": "0.02886546184738956",
        "Min": "0.01",
        "Max": "2.5",
        "P90": "0.0640972222222222",
        "P99": "2.5"
      },
      "jetstream_time_per_request": {
        "Mean": "2.4594473866713877",
        "Median": "2.1924157303370784",
        "Min": "1",
        "Max": "10",
        "P90": "5.4921874999999964",
        "P99": "9.549218750000001"
      }
    }
  },
  "dimensions": {
    "date": "20240924-210117",
    "backend": "jetstream",
    "model_id": "google/gemma-7b",
    "tokenizer_id": "google/gemma-7b"
  },
  "config": {
    "model": "google/gemma-7b",
    "model_server": "jetstream",
    "start_time": "2024-09-24T21:01:17.055735Z"
  },
  "summary_stats": {
    "stats": {
      "ttft": {
        "Mean": "0.129368504891402",
        "Median": "0.1624223602484472",
        "Min": "0.01",
        "Max": "0.25",
        "P90": "0.23248447204968944",
        "P99": "0.24824844720496894"
      },
      "tpot": {
        "Mean": "0",
        "Median": "0.02886546184738956",
        "Min": "0.01",
        "Max": "2.5",
        "P90": "0.0640972222222222",
        "P99": "2.5"
      },
      "request_latency": {
        "Mean": "2.4594473866713877",
        "Median": "2.1924157303370784",
        "Min": "1",
        "Max": "10",
        "P90": "5.4921874999999964",
        "P99": "9.549218750000001"
      }
    }
  }
}

benchmarks/benchmark/tools/profile-generator/container/benchmark_serving.py

jjk-g · 2024-09-24T21:19:44Z

benchmarks/benchmark/tools/profile-generator/container/benchmark_serving.py

+    # If a value is specified for a given key, it will be populated on the outputs `summary_stats.stats` field as 'value':'stats' as well.
+    if backend == "vllm":
+        return {
+            "vllm:gpu_cache_usage_perc": None,


How does vllm get ttft, tpot, request_latency if jetstream does it like below?

Currently we don't formally report stats on these metrics, although we do collect and report these. Ideally we can collect these stats from prometheus, this will be addressed in a followup.

So, where do we have the data the benchmark produced in the report? We report request latency, average time per output token, etc.

Bslabe123 added 6 commits September 23, 2024 17:55

first-commit

e85f68e

revert image name

91875fa

missint .items()

fbc491e

logic for hisogram metrics

e8c7b9e

remove std

1b3090f

Duplicate data temporarily

607bff7

Bslabe123 requested review from achandrasekar, ahg-g and annapendleton as code owners September 23, 2024 22:54

Bslabe123 enabled auto-merge (squash) September 24, 2024 16:07

add config

9afd288

achandrasekar reviewed Sep 24, 2024

View reviewed changes

benchmarks/benchmark/tools/profile-generator/container/benchmark_serving.py Outdated Show resolved Hide resolved

Bslabe123 added 7 commits September 24, 2024 19:08

Better metrics organization

50fed44

remove repeated credentials refresh

2505423

change conditional

aabcc4d

summary -> summary_stats

0f68ab9

remove todo

365c417

better description

757054d

tweak

282437f

achandrasekar approved these changes Sep 24, 2024

View reviewed changes

jjk-g reviewed Sep 24, 2024

View reviewed changes

change start time

0bb2809

Bslabe123 merged commit 7b83554 into main Sep 24, 2024
11 checks passed

Bslabe123 mentioned this pull request Sep 25, 2024

Missing JSON fields #835

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More metrics on LPG script #832

More metrics on LPG script #832

Bslabe123 commented Sep 23, 2024 •

edited

Loading

jjk-g Sep 24, 2024

Bslabe123 Sep 24, 2024 •

edited

Loading

achandrasekar Sep 24, 2024

More metrics on LPG script #832

More metrics on LPG script #832

Conversation

Bslabe123 commented Sep 23, 2024 • edited Loading

jjk-g Sep 24, 2024

Choose a reason for hiding this comment

Bslabe123 Sep 24, 2024 • edited Loading

Choose a reason for hiding this comment

achandrasekar Sep 24, 2024

Choose a reason for hiding this comment

Bslabe123 commented Sep 23, 2024 •

edited

Loading

Bslabe123 Sep 24, 2024 •

edited

Loading