Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More metrics on LPG script #832

Merged
merged 15 commits into from
Sep 24, 2024
Merged

More metrics on LPG script #832

merged 15 commits into from
Sep 24, 2024

Conversation

Bslabe123
Copy link
Collaborator

@Bslabe123 Bslabe123 commented Sep 23, 2024

Example LPG json results, a few notes:

  • TPOT metric is faulty and incorrectly records inf to buckets
  • Some values are duplicated across various fields. This is intentional. This json is expected to be consumed by various pipelines, each expecting uniformity.
{
  "metrics": {
    "num_prompts": 600,
    "request_rate": 20,
    "server_metrics": {
      "jetstream_slots_used_percentage": {
        "Mean": "0.81875",
        "Median": "1",
        "Std": "0.3625",
        "Min": "0.09375",
        "Max": "1",
        "P90": "1",
        "P99": "1"
      },
      "jetstream_prefill_backlog_size": {
        "Mean": "66.75",
        "Median": "75",
        "Std": "44.71786555729153",
        "Min": "0",
        "Max": "117",
        "P90": "110.7",
        "P99": "116.36999999999999"
      },
      "jetstream_time_to_first_token": {
        "Mean": "0.129368504891402",
        "Median": "0.1624223602484472",
        "Min": "0.01",
        "Max": "0.25",
        "P90": "0.23248447204968944",
        "P99": "0.24824844720496894"
      },
      "jetstream_time_per_output_token": {
        "Mean": "0",
        "Median": "0.02886546184738956",
        "Min": "0.01",
        "Max": "2.5",
        "P90": "0.0640972222222222",
        "P99": "2.5"
      },
      "jetstream_time_per_request": {
        "Mean": "2.4594473866713877",
        "Median": "2.1924157303370784",
        "Min": "1",
        "Max": "10",
        "P90": "5.4921874999999964",
        "P99": "9.549218750000001"
      }
    }
  },
  "dimensions": {
    "date": "20240924-210117",
    "backend": "jetstream",
    "model_id": "google/gemma-7b",
    "tokenizer_id": "google/gemma-7b"
  },
  "config": {
    "model": "google/gemma-7b",
    "model_server": "jetstream",
    "start_time": "2024-09-24T21:01:17.055735Z"
  },
  "summary_stats": {
    "stats": {
      "ttft": {
        "Mean": "0.129368504891402",
        "Median": "0.1624223602484472",
        "Min": "0.01",
        "Max": "0.25",
        "P90": "0.23248447204968944",
        "P99": "0.24824844720496894"
      },
      "tpot": {
        "Mean": "0",
        "Median": "0.02886546184738956",
        "Min": "0.01",
        "Max": "2.5",
        "P90": "0.0640972222222222",
        "P99": "2.5"
      },
      "request_latency": {
        "Mean": "2.4594473866713877",
        "Median": "2.1924157303370784",
        "Min": "1",
        "Max": "10",
        "P90": "5.4921874999999964",
        "P99": "9.549218750000001"
      }
    }
  }
}

# If a value is specified for a given key, it will be populated on the outputs `summary_stats.stats` field as 'value':'stats' as well.
if backend == "vllm":
return {
"vllm:gpu_cache_usage_perc": None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does vllm get ttft, tpot, request_latency if jetstream does it like below?

Copy link
Collaborator Author

@Bslabe123 Bslabe123 Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we don't formally report stats on these metrics, although we do collect and report these. Ideally we can collect these stats from prometheus, this will be addressed in a followup.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, where do we have the data the benchmark produced in the report? We report request latency, average time per output token, etc.

@Bslabe123 Bslabe123 merged commit 7b83554 into main Sep 24, 2024
11 checks passed
@Bslabe123 Bslabe123 mentioned this pull request Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants